EPD Exercises

1. How to use the EPDnew hub at UCSC Genome Browser

The UCSC Genome Browser has become the tool of reference for the visualisation of genomic data. It is extremly flexible and allows users to customise the genomic view with the addition or removal of 'traks' (genomic data) and to share the results. For these reasons we developped a UCSC track Hub (a collection of genomic data organised in a defined way for all organisms supported by us) for EPDnew that contains useful data to investigate promoters features and characteristics. The next sub-sections describe how to use the Hub in UCSC Genome Browser with examples that highlight promoters characteristics that can be easly retrive from it.

Access the EPDnew Hub

The first step to use the EPDnew Hub is to load it in the Genome Browser since the Hub is store on our server at Vital-it. The easier way to do so is to follow the link on the promoter viewer page at EPD. The following is an example of how to do it with one specific promoter. You are welcome to use the same example or to investigate another promoter of your choice.

Another way of loading the Hub is from the Track Hub configuration panel at UCSC. From the UCSC Genome Browser home page click on 'My Data' at the top of the page and then on 'Track Hubs'. In the next page Click on the 'My Hubs' and paste the following URL on the input form


and click the 'Add Hub'. Note that if you followed the previous example the hub should be already loaded and listed and you do not need to load it a second time. Here you can also disconnect the hub from your viewer.

Modify tracks views

The default Hub visualisation is designed to show in a compact view informations about chromatin states around promoters (H3K4me3 and H3K4me1), Pol-II activity, promoter structure (CAGE), conservation scores etc. Here it is explained how to change the CAGE track set to investigate promoter expression levels in several tissues / cells and to change the view to suite your need. Please note that tags numbers in the track set have not been normalised between samples. Small variation in tag counts could be due to different sequencing depth.

Add data to the viewer

UCSC Benome Browser has a very large collection of data (mostly human specific) that can be loaded into the viewer to complement the EPDnew Hub. The complete list of data readly available can be found in the bottom half of the UCSC Genome Browser viewer page. To activate them, simply select the viewing stile from the relative drop-down menu and click 'refresh'.

On the other end, often scientist want to add their own data to the viewer and found difficult to convert their data structure to a format accepted by UCSC. For this reason we have recently added a new tool in the ChIP-Seq family called ChIP-Track that will help you doing just that. Please note that a formal introduction of the tool will be given later during the course. Here it is shown how to generate a nucleosome track (from MNase-seq experiment) for D. rerio for the region around tbp_1 promoter.

The track data generated here is store in a temporary file on our server that will be automatically deleted after 1 hour. If you want to permanently save the track file you have to download the WIG Track File from the ChIP-Track output page and manually upload it in the Genome Browser by clicking on 'Manage custom track' at the bottom of the image and then on 'add custom track'.

Some examples of interesting promoters

The following is a list of promoters that show how dynamic the promoter region can be during transcription. You can investigate them starting from EPDnew and then on the UCSC Genome Browser. You are welcome to investigate other promoters as well.

2. Promoter selection

With this exercise you will learn how to select promoters that are expressed under a particular sample and to study them.

As an example, you will use the ENCODE data for cell line GM12878. This is an ENCODE tier 1 cell line. As a consequence, it has been heavily studied, providing data for almost all conditions / targets used by the consotium.

In a first step you will use ChIP-Cor to select promoters that are expressed in GM12878, then you will study their histone marks distribution and additionally check their Gene Onthology (using the GREAT suite)

3. Comparative analysis of promoter chromatin architecture

This exercise shows a comparative analysis of promoter chromatin architecture in 3 organisms: H. sapiens, D. rerio and S. cerevisiae. This has been done using the ChIP-Cor tool.

As before, all the data used to generate this figure is present in the MGA database, the back-end data archive used by EPD and the other tools developed by our group. Its content can be easily accessed via a drop-down menu presents in the input form of our tools or through an ftp site.

Here we will show the steps to reproduce the chromatin distribution around H. sapiens promoters:

4. Study of enriched TF binding sites in promoters

Some TF binding motifs are strongly over-represented in promoters, whereas other are not. Some of these are present in almost all promoters whereas others are found in promoters of genes expressed only under specific circumstances (eg. cell types, growth condition). Here we present an exercise that study promoters to identify these two classes of TF.

To do so, we can use ChIP-Cor (to study the promoters' distribution of a single TF) or CentriMo a tool of the MEME suite (to study a large group of TFs). CentriMo identifies known or user-provided motifs that show a significant preference for particular locations in your sequence, in this case any location around genes promoters. Compared to ChIP-Cor, CentriMo, as other MEME tools, can not handle a large collection of sequences. For this reason we will restrict this analysis to only a subsection of human promoters (2500) that can be selected randombly from EPDnew collection or if expressed in a specific cell type (you already have them, from exercise 2).

Analysis of single TFs around promoters:

This analysis will be carried out using ChIP-Cor with the EPDnew promoter database and TF experimental and predicted binding (using the ENCODE data and a collection of TF binding sites inferred using TF position weight matrices).

Analysis of multiple TFs using CentriMo:

This exercise will be done using CentriMo with two list of promoters: one expressed in a specific cell line and the other randombly chosen from the list of EPDnew promoters: