Lausanne, 27 April - 1 May 2015
The tutorial explains how to discover motifs with the SSA server programs SList and PatOp.
PatOp has the advantage over MEME and MEME-ChIP that it can use much larger sequence sets for optimizing a PWM, which should logically increase the quality of the result.
Try to optimize PWMs from other peak lists with SList and PatOP and compare the resuting matrices with corresponding PWMs available from the SSA server menus.
If you know the approximate size of the optimal window (which is usually the case with ChIP-seq data) you can switch off the window size optimization by setting
Window size min = Window size maxLikewise, if you have a pretty good idea about the length of the motif you can turn off "Matrix extension". Both restriction make the optimization process faster and more robust.
The goal of this exercise is to evaluate the performance of PWM models either derived from MSeq or present in public libraries using ChIP-seq data.
A set of two models can be derived from MSeq using the JOLMA SELEX library for NFKB1 (cycle 3) and the following two word seeds:
After selecting the JOLMA SELEX library for NFKB1(NFKB1_TCTCAA20NGA_AA_3) on the MSeq main page, use the PWM Optimize Tool with the default parameters and consensus sequences GGGRRNYYVVV and GGGRRNYYCCC respectively.
On the result page, click on the PWMEval link to get directly to the PWMEval page. There, you already have the matrix uploaded. On the left-side menu of PWMEval, select one by one the 10 human NFKB ranked peak lists from UCSC that will be used for the evaluation:
For each PWM model, collect the 10 AUC values corresponding to each ChIP-Peak data set. Save the values (on a paper sheet).
Another set of models can be directly taken from the on-line PWM libraries on the PWMEval Web page. Select the following models:
For each model, carry out 10 performance measurements using the same set of 10 human NFKB ranked peak lists from UCSC. On the left-side menu of PWMEval, select:
Compute the corresponding AUC and save the value.
After having done that for all the 5 PWM models, generate a R plot with the results.
The R template for 3 PWM Models evaluated using the 10 ChIP-seq peak samples from UCSC is the following:
auc_NFKB1_3_p <- c(0.857, 0.830, 0.864, 0.817, 0.784, 0.820, 0.81, 0.86, 0.845, 0.848)
auc_NFKB1_3_a <- c(0.832, 0.825, 0.843, 0.805, 0.774, 0.812, 0.796, 0.84, 0.84, 0.83)
auc_NFKB1_3_pu <- c(0.870, 0.847, 0.884, 0.838, 0.796, 0.839, 0.83, 0.87, 0.867, 0.865)
plot(auc_NFKB1_3_a, type="o", frame.plot=F, xlab="ChIP-seq NFKB data (Uniform TFBS from UCSC)", ylab="AUC of PWM models", col="red", ylim=c(0.7,1.0), xlim=c(1,10), lwd=2, cex.axis=1.3, cex.lab=1.3, xaxt="n")
lines(auc_NFKB1_3_p, col="blue", type="o", pch=22, lty=2, lwd=2)
lines(auc_NFKB1_3_pu, col="darkgrey", type="o", lty=2, lwd=2)
title(main="Predicting In vivo binding", col.main="Blue", font.main=3, cex.main=1.4)
legend(x="bottomright", legend=c("auc_NFKB1_3_a", "auc_NFKB1_3_p", "auc_NFKB1_3_pu"), col=c("red", "blue", "darkgrey"), lty=c(1,2,1), pch=c(21,22,21), lwd=2, pt.cex=1.0)