Chip-seq data analysis: from quality check to motif discovery and more

Lausanne, 27 April - 1 May 2015

ChIP-Seq server - the nitty-gritty details

Author: Philipp Bucher


This practical session is largely based on the existing tutorials. Some of them were written some times ago. All of them should work, but results may not reproduce exactly because of more recent changes in the software environment. Also, some recipes may be sub-optimal today as they do not make use of more recently added features. This concerns primarily the navigation buttons posted on the result pages which often make it obsolete to save an output file to disk in order to uplaod to another input form.

All tutorials are posted on the ChIP-Seq server home page under Documentation

Exercise 1: Data upload

Go through: If you have your own ChIP-Seq data which you plan to analyse with our tools in the future then try to convert these data into SGA format now. Ask the asisstants for help if necessary. You are welcome to do some of the following exercises with your own data instead of the example proposed by us.

2. Quality control of ChIP-seq data

Go through Exercise 1 of The tutorial proposes 5'/3' correlation analysis of ChIP-Seq data for 15 transcription factors in mouse embryonic stem cells. The study is presented in a paper by Chen et al. 2008 Chen et al. 2008.

The exercise will not take much time. You are invited to consider the following extensions:

3. Effects of Repeat Masking

Go through Exercise 7 of In this exercise, you are essentially asked to repeat the MEME exercise proposed on the first day (Tutorial 3, Section 5) but this time with non-repeat masked sequences as well. The results are quite different, and the tutorial provides explanations these differences.

The tutorial also proposes to use another tool, RSAT for motif discovery. Consider this part optional. We are not going to use RSAT for other exercises in this course.

We strongy recommend to repeat the last part of Exercise 7 of

with a non-repeat-masked STAT1 peak list. In this part, the spacing between pairs of STAT1 sites is analyzed. If time permits, try to reproduce the corresponding figure for unmasked in vivo bound STAT1 motifs.