Chip-seq data analysis: from quality check to motif discovery and more
Lausanne, 27 April - 1 May 2015
ChIP-Seq server - the nitty-gritty details
Author: Philipp Bucher
This practical session is largely based on the existing tutorials. Some of them
were written some times ago. All of them should work, but results may not
reproduce exactly because of more recent changes in the software environment.
Also, some recipes may be sub-optimal today as they do not make use
of more recently added features. This concerns primarily the navigation
buttons posted on the result pages which often make it obsolete to save
an output file to disk in order to uplaod to another input form.
All tutorials are posted on the
ChIP-Seq server home page
Exercise 1: Data upload
If you have your own ChIP-Seq data which you plan to analyse with our
tools in the future then try to convert these data into SGA format now.
Ask the asisstants for help if necessary. You are welcome to do some of
the following exercises with your own data instead of the example proposed
2. Quality control of ChIP-seq data
Go through Exercise 1 of
The tutorial proposes 5'/3' correlation analysis of ChIP-Seq data
for 15 transcription factors in mouse embryonic stem cells. The study
is presented in a paper by Chen et al. 2008
Chen et al. 2008.
The exercise will not take much time. You are invited to consider the following
- Do the analysis for all 15 samples, not just for three considered in the
tutorial. Based on visual inspection of the graph, propose a quality ranking.
- The authors also published peak lists for these data sets which
are accessible under data type "ChIP-seq-peaks" from the ChIP-Seq server
menus. Carry out motif enrichment analysis for the factors for which you find
corresponding PWMs on the SSA server. In principle, you would expect lower
motif enrichment for the peak lists derived from low quality data. Is this really
3. Effects of Repeat Masking
Go through Exercise 7 of
In this exercise, you are essentially asked to repeat the MEME exercise proposed
on the first day (Tutorial 3, Section 5) but this time with non-repeat masked
sequences as well. The results are quite different, and the tutorial
provides explanations these differences.
The tutorial also proposes to use another tool,
RSAT for motif discovery. Consider this part
optional. We are not going to use RSAT for other exercises in this course.
We strongy recommend to repeat the last part of Exercise 7 of
with a non-repeat-masked STAT1 peak list. In this part, the spacing
between pairs of STAT1 sites is analyzed. If time permits, try
to reproduce the corresponding figure for unmasked in vivo
bound STAT1 motifs.