I'm sure these questions are probably addressed somewhere in the Challenge docs, but I have to admit I'm still confused about some details
I noticed in the training, ladder, and testing coordinate files there are file that end with "merged.bed"
What exactly are these coordinates supposed to represent?
What is the purpose of these files?
Created by Lawrence Du LawrenceDu Yes. The merged files are simply provided to indicate the contiguous regions of the genome for which predictions are required. However, your submissions must provide the predictions using the coordinates for each bin provided in the non-merged files.
-Anshul. The files I found in training_data.annotation.tar are as follows:
ladder_regions.blacklistfiltered.bed.gz
ladder_regions.blacklistfiltered.merged.bed
test_regions.blacklistfiltered.bed.gz
test_regions.blacklistfiltered.merged.bed
train_regions.blacklistfiltered.bed.gz
train_regions.blacklistfiltered.merged.bed
I figured out what the merged.bed files are. They appear to be the output of the bedtools.merged command.
Is this presumably to reduce the memory consumption for reading contiguous regions of the genome?
Which files are your referring to. Can you point out the exact file names and locations.
Thanks,
Anshul.