I know you are creating a new dataset so maybe this will be addressed already. The current insert sizes for the simulated data are 150-170nt. With read lenghs of 100nt, this leads to overlapping alignments. I don't think that corresponds to normal Illumina experiments, where the insert size is usually longer than twice the read length. Do you intend to create a more diverse training set? Or maybe increase the size of all simulated fragments?

Created by Jeltje van Baren jeltje
Thanks for the quick response Kristen! And glad to hear about the larger insert sizes. Is there an ETA on the new dataset? I'm trying to get this up and running before 2017... Best, -Jeltje
Dear Jeltje, Thanks for your comment. The insert size and read length parameters of the current data set are based on a real dataset that was generated using normal procedures for that facility's cancer sequencing datasets. For several reasons, however, we are planning to include larger insert sizes in the next dataset. Thanks, Kristen

simulated read mate-inner-distance page is loading…