Hello, Could anyone explain a little bit how these were generated? I have been trying to find out by looking at the provided code, but I can't seem to figure this piece out; I mean, does MACS2 take the 5' ends and compare that to the reads, or did you provide a track of 5' counts and the background was a file with the reads? Thank you in advance, Dayanne

Created by Dayanne Castro dayanne.castro
Thanks, that is very helpful!
Hi Dayanne, The same method I linked to above is used to compute fold-enrichment tracks for ChIP-seq and DNase-seq with a few minor differences. For ChIP-seq, you extend reads in the 5' to 3' direction by the predominant fragment length / insert size in the data. Then you count the number of extended reads overlapping each position. This is done because the 5' ends of reads represent the ends of the fragments to which the TF may be bound (the TF binds somewhere inside the fragment). By extending the reads and counting extended-reads, you are able to better estimate the central position within a set of fragments that the TF would be binding. You can read this paper for more details http://www.ncbi.nlm.nih.gov/pubmed/19736561. For DNase-seq, the 5' ends of reads are infact the cut-sites. So here you do want to count the 5'-ends of reads. But we also want some amount of smoothing. So here what is done is the 5' end of each read is extended in both directions by 75 bp (unlike ChIP-seq where the reads are extended in the 5' to 3' direction), effectively smoothing the 5'-end (or cut-site) counts by 150 bp. The other difference is that for ChIP-seq, we have a input-DNA control dataset for each ChIP-seq dataset. So the expected background read counts are obtained from from the control dataset. For DNase-seq there is no explicit control dataset. So the expected background are estimated from the DNase-seq data itself using a simulated control. The simulated control gives an estimate of a global background whereby the same number of DNase-seq reads are scattered uniformly across the genome and the expected number of reads per position (global lambda of a Poisson null model) is computed. A local background model is also constructed at each position but taking a large window around the position (e.g. 10Kb) and computing the expected number of reads per position (local lambda) within the window based on a local Poisson. At each position, the max of the local and global lambda is used as the denominator in the fold-enrichment calculation. For ChIP-seq, these local and global lambdas are estimated from the control. Wrt. the MACS2 commands, this basically only means we use different parameter values for ChIP-seq and DNase-seq. The Github pipelines we have linked to in the documentation perform all the processing. Thanks, Anshul.
Hi Ramil, Yes the null model is a local Poisson. This doesn't mean we have to provide a pvalue. You can get expected counts (the lambda) from the Poisson. This is what is used as the denominator in the fold-enrichment. ChIP extended-read counts / local lambda. If you need more detail you can read the MACS paper and the link I provided above to the MACS2 documentation in more detail if you need more information. And to clarify once again, the signals provided are fold-enrichment values not -log10(pvalues). Thanks, Anshul.
I was actually interested in the DNase-seq FC wiggle files... because it says you used the cut sites, but to call peaks the reads were used. Is that correct?
Dear Anshul Can you clarify the situation more deeply. . "macs2 bdgcmp" has parameter: --method {ppois,qpois,subtract,logFE,FE,logLR,slogLR,max} . Which exactly method from above you used? (I suppose FE) . You replied to Dayanne Castro (dayanne.castro) >>> In step 6 instead of computing the pvalue or qvalue, to get the fold-enrichment you compute the ratio of ChIP >>>extended-read counts to the expected extended-read counts (based on the local Poisson model) from the input >>>control after scaling for read depth differences. . You answered my question in other topic: >>>We don't plan to provide pvalue tracks for any of the data types at this stage. You will need to work with the >>>fold-enrichment tracks provided for ChIP-seq (which are the outputs). . . You mentioned replying to Dayanne "based on the local Poisson model" but "don't plan to provide pvalue" to me. . There is a rough way to make pvalue track from FC track, I did it, but I need to be sure that the original signal is not already -log(p.value) Ramil .
See this https://github.com/taoliu/MACS/wiki/Advanced:-Call-peaks-using-MACS2-subcommands . In step 6 instead of computing the pvalue or qvalue, to get the fold-enrichment you compute the ratio of ChIP extended-read counts to the expected extended-read counts (based on the local Poisson model) from the input control after scaling for read depth differences.

How were the fold-enrichment tracks generated? page is loading…