Hi,
In dryrun dataset, is read position information included in its name? Reads with "_0_0_0_" in its name are also unmappable on even human genome. How these reads was simulated? where does it truly come from?
Thank you very much
Created by Thao Truong Ginny After a couple of delays we should have the new set of training tumors ready to go. We'll be posting them to the google bucket and sending out info. You'll also need to update the 'dream_runner' script to use them (if thats how you've been testing your workflows).
We also plan on releasing the simulation pipeline in the near future, so that we can collect comments and updates from the community before the next simulated round. Hi Kyle,
Are we expecting to have the new round of training data before the new year?
Thanks! Thank you very much Sorry for the delay (american thanksgiving week ;-)
If you are referring to the files found in https://console.cloud.google.com/storage/browser/dream-smc-rna/for_dry_run, those are archived simulations, from a very old version of the simulator. We don't recommend anybody use that data for training, and will probably be removing it from the repo to prevent confusion. The reads you mentioned are probably the bi-product of old code. We're almost done with a new round of training data simulation and will hopefully be releasing those within the next few days. It would be very beneficial to have this information!