I count 903,879 emtpy reads in sim32_mergeSort_1.fq The first example is read `@1000` on line 17. In context: ``` @100 CAGGCCCAGCGTTGTGATGGAGTAAATGACTGCTTTGATGAAAGTGATGAACTGTTTTGCGTGAGCCCTCAACCTGCCTGCAATACCAGCTCCTTCAGGC + CDCCCC@@DDEDDEEGFFFEDDDDDFGIIIJGHJJGGFHHJJJHCB98@88=@BDCEEDDDDDDDDDBB@@C?CDBBBDDDDD31?@FFFFGIIHHHHFA @1000 + C @10000 CACCTCCCTGGACACTCTGTCTCTGGCTTCTTTCTGCCTAGCTCATCTCTAGCCAATCTTACAGTTATATATCTTAAGCCCTCTTCTCTTTGTTCTTTAA + CCCACDDDDDDDBDDDDDDBBBFFFHIJIIGEHHBBBCFFFFFDGICreated by Jeltje van Baren jeltje
Revised sim32 dataset has been posted to the Google bucket gs://dream-smc-rna/training/.
Dear Jeltje and Jeff, Apologies, this is a known problem that was corrected earlier in the simulation procedure. Unfortunately, sim32 missed the correction. We will post a revised dataset for sim32 very shortly.
I can confirm the problem with sim32 having many blank reads as well. Other FASTQs do not have this problem.

sim32 dataset contains many bogus reads page is loading…