Hello Challengers - We have received some questions about very high values for some regions in the training data. For example, the DNase (M02) signal in H9 cells (C18) near chr1:630000. This can happen at blacklisted or sample-specific artifact regions. These sorts of artifacts have not been filtered from the tracks and are a real part of the data these experiments produce. You should make your methods robust to these sorts of outliers.
Seth
Created by J Seth Strattan jseth @arcanum449 you have the correct blacklist. I think you are right, this region should be included. While the blacklist should be nearly complete, there may still be some regions like this one that were not detected by the blacklist development method. So while the blacklist is an easy way to exclude artifact-producing regions, methods should still be robust to outliers like this one that persist in real data.
Seth
Thanks, Seth!
In hg38, the only blacklist region for chr1 is
chr1 124450730 124450960
which does not contain the region ?near chr1:630000? mentioned in your thread.
We are wondering if the blacklist is outdated - we downloaded it from here:
http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz