Lausanne, 27 April - 1 May 2015

Working with public data: Organization and contents of the Mass Genome Annotation (MGA) data repository, introduction to genomic context analysis

Author: Philipp Bucher

1. Analysis of histone modifications with the ChIP-Part tool

This tutorial is mainly about analyzing ChIP-Seq data for histone modifications. It also teaches you how to use the ChIP-Part program for defining chromatin domains of variable size that are enriched in a particluar feature.

Feel free to do or redo the exercises with other data on our server, or with your own data.

2. Analyzing chromatin domain boundaries - another application if the of the ChIP-Part tool

Go through Exercise 4 of In this exercise you use the ChIP-Part tool to define H3K4me1 enriched chromosomal regions. You then save the domain boundaries as an SGA file and subsequently use ChIP-Cor to test wether H3K4me1 boundaries concide with changes in the abundancy of other histone marks (or any other kind of genomic feature).

Feel free to repeat the exercise with any feature you like. ChIP-Seq Tutorial 1 part 3 Exercise 4

3. Various examples of genomics context analysis

Go to Part C: Advanced applications This tutorial present various examples of genomic context analysis. It is partly redundant with other exercises proposed during this course. You may go through it quickly or decide to skip it.

4. Characterizing the diversity of TF binding sites with aggretations plots (APs)

The term aggregation plot (AP) is often used in the literature to refer the the kind of graphs generated by ChIP-Cor ("aggregation" because the plot shows an average over a set of "aggregated" genomic regions.).

Below is a Figure from this morning's presentation showing the sequence conservation accross vertebrate genomes in the vicinity of binding sites of a number of transcription factors that are active in mouse ES cells.

Make similar composite Figures for other transcription factors in other species and cell types (and possibly also with other contextual features) using the ChIP-seq peak lists available from our server.

5. Correlations between two chromatin features visualized by 2D plots

To understand whether (and to what degree) contextual genomic features influence the occupancy levels of in vivo TF binidng site, it is often helpful to visualize the correlation with smoothed scatter plots.

We present a recipe for making such plots. The file:

contains the peak list from
   hg19 -> ENCODE ChIP-seq-peak -> Uniform TFBS from UCSC
   Sample: Hela-S3 AP-2alpha None USC peaks
annotated with DNase I counts from:
   -> ENCODE DNAse FAIRE etc. ->  Boyle 2008, Open chromatin ..
   Sample: DNaseI HS-HeLa-S3-Rep. 
The peak annotation has been done in the usual way, using ChIP-Cor and Enriched Feature Extraction Option.

The R plot code for making this figure is given below:

R code: Show Hide
Make similar plots for other kinds of feature correlations. You may also investigate the relationship between in vivo occupancy and PWM score (see exercises of Monday afternoon session) with the aid of such plots.