Dear Anshul
I have found one biased DNASE sample:
>bigWigInfo DNASE.induced_pluripotent_stem_cell.fc.signal.bigwig
>version: 4
>isCompressed: yes
>isSwapped: 0
>primaryDataSize: 579,344,830
>primaryIndexSize: 3,640,876
>zoomLevels: 10
>chromCount: 24
>basesCovered: 3,094,238,405
>mean: 0.702175
>min: 0.000000
>max: 5921.363770 <--------- here
>std: 6.246072
.
While for the rest of DNASE bigwig the maximum value ranges from 47.4074 to 66.6683
.
For the ChipSeq bigwig the maximum signal ranges from 51.47 to 902.32
.
So, from my point of view, this difference disallows us to compare tissues. I cannot include the value of DNASE signal into my prediction model.
The organizers ask us to make a cross tissues model and all the values are incompatible between tissues.
.
From my point of view this happen because of different read sequenced for these cells or cell-chip pairs.
.
I suppose that all these tracks were made by "macs2 bdgcmp \-\-method qpois"
Probably the command "macs2 bdgcmp \-\-method subtract" will give more comparable results.
Ramil
Created by Ramil Nurtdinov n.ramil Hi Ramil,
Great observation. However, you are mistaken. This has nothing to do with incompatibility nor is this a problem that precludes you from using the DNase data as a predictor.
What you are looking at is the maximum value across the whole genome. The genome has certain regions that can show abnormally high signal and in some cases these are cell-type specific (e.g. due to copy number variants or specific repeat structures). This is typical. We have identified a large number of these regions that are largely common across cell types and refer to them as blacklist regions. These blacklist regions are filtered from the peak files (for DNase and TF ChIP-seq) but not the bigwig signal files. Dealing with these types of outlier regions is part and parcel of dealing with functional genomic data. It most certainly does not mean the signal is not usable. The rest of the genome is behaving perfectly normally especially the regions overlapping the peaks. Your prediction method needs to learn to deal with outliers.
>> So, from my point of view, this difference disallows us to compare tissues. I cannot include the value of DNASE signal into my prediction model. The organizers ask us to make a cross tissues model and all the values are incompatible between tissues.
Sorry but this statement is incorrect. The data are most certainly compatible and high quality. The data provided represent fold-enrichments of signal. Normalizing input data is a big part of getting predictive models to work. You are going to have to figure out how to do this. We have specifically stated in the instructions that users are free to use any transformations to the data provided.
>> I suppose that all these tracks were made by "macs2 bdgcmp --method qpois" Probably the command "macs2 bdgcmp --method subtract" will give more comparable results.
No. These are fold-enrichment tracks not q-value tracks. qpois generates q-value tracks. The subtract method will not give more comparable results. You need to figure out ways to normalize the data across cell types.
Thanks,
Anshul.
Drop files to upload
incompatible values in fc.signal.bigwig page is loading…