Lausanne, 27 April - 1 May 2015
Working with public data:
Organization and contents of the Mass Genome Annotation (MGA) data repository,
introduction to genomic context analysis
Author: Philipp Bucher
1. Analysis of histone modifications with the ChIP-Part tool
This tutorial is mainly about analyzing ChIP-Seq data for histone modifications.
It also teaches you how to use the ChIP-Part program for defining chromatin
domains of variable size that are enriched in a particluar feature.
Feel free to do or redo the exercises with other data on our server,
or with your own data.
2. Analyzing chromatin domain boundaries - another application if the of the ChIP-Part tool
Go through Exercise 4 of
In this exercise you use the ChIP-Part tool to define H3K4me1 enriched chromosomal
regions. You then save the domain boundaries as an SGA file and subsequently
use ChIP-Cor to test wether H3K4me1 boundaries concide with
changes in the abundancy of other histone marks (or any other kind of
Feel free to repeat the exercise with any feature you like.
ChIP-Seq Tutorial 1 part 3
3. Various examples of genomics context analysis
Part C: Advanced applications
This tutorial present various examples of genomic context analysis. It is
partly redundant with other exercises proposed during this course.
You may go through it quickly or decide to skip it.
4. Characterizing the diversity of TF binding sites with
aggretations plots (APs)
The term aggregation plot (AP) is often used in the literature to refer
the the kind of graphs generated by ChIP-Cor ("aggregation" because the plot shows
an average over a set of "aggregated" genomic regions.).
Below is a Figure from this morning's presentation showing the sequence
conservation accross vertebrate genomes in the vicinity of binding sites of
a number of transcription factors that are active in mouse ES cells.
Make similar composite Figures for other transcription factors in other
species and cell types (and possibly also with other contextual features)
using the ChIP-seq peak lists available from our server.
5. Correlations between two chromatin features visualized by 2D plots
To understand whether (and to what degree) contextual genomic features influence
the occupancy levels of in vivo TF binidng site, it is often helpful
to visualize the correlation with smoothed scatter plots.
We present a recipe for making such plots. The
contains the peak list from
hg19 -> ENCODE ChIP-seq-peak -> Uniform TFBS from UCSC
Sample: Hela-S3 AP-2alpha None USC peaks
annotated with DNase I counts from:
-> ENCODE DNAse FAIRE etc. -> Boyle 2008, Open chromatin ..
Sample: DNaseI HS-HeLa-S3-Rep.
The peak annotation has been done in the usual way, using
ChIP-Cor and Enriched Feature Extraction Option.
The R plot code for making this figure is given below:
Make similar plots for other kinds of feature correlations.
You may also investigate
the relationship between in vivo occupancy and PWM score (see exercises
of Monday afternoon session) with the aid of such plots.
x = read.table("AP2_DNaseIHS.sga", header=F, sep="\t")
colorP = colorRampPalette(c("white", "lightblue", "blue", "darkgreen",
"green", "yellow", "orange", "darkorange", "red", "darkred"))
# OR #
colorP = colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan",
"#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
smoothScatter(x[,6], as.numeric(gsub("DNaseI-HS=","", x[,7])),
nrpoints=0, colramp=colorP, xlab="Peak score", ylab="Tag counts",