Dear @CTDsquaredPancancerDrugActivityDREAMChallengeParticipants, We'd like to announce that we are making the baseline training dataset available to help get participants get started with their submissions. The training data include some consensus signatures from the LINCS [drug screening](syn21539387) and [shRNA](syn21539386) datasets. Drug metadata is also [available](syn21539388). You can learn more about this dataset and how to reproduce it (or alter it) by following the python notebook in this repository: https://github.com/bence-szalai/Data-preparation-for-CTD2/. We encourage you to try incorporating this dataset into your method and seeing how it affects your score, and to check out all of the other great datasets described on the [Data](https://www.synapse.org/#!Synapse:syn20968331/wiki/599458) page, as this sample dataset is far from the only publicly available data that might help with this Challenge. Also, note that this sample dataset is only representative of gene expression signatures - using dose-response data may also prove fruitful in improving over the baseline model. Please feel free to post any questions on the discussion board. Cheers, Robert

Created by Robert Allaway allawayr
Thank you for the feedback, I appreciate it. We'll certainly use your comments to improve our approach in the future.
Hi, I really appreciate the good intention of the organizers for posting an example approach. Yet, I don't think it is really fair. Firstly, it is only less than 10 days to the deadline of the leaderboard period. To exemplify an approach, and I believe an "effective" approach at this moment will only disrupt all participants' plans. Secondly, it is highly possible that the example approach could overlap with top-performing approaches, which came from some participants weeks' of work. After all, this is still a competition-nature challenge, the organizers should respect everyone's work, effective or ineffective. It is highly annoying to throw in some ingredients randomly, supposedly the organizers are aware of what will "work". If some approaches should be exemplified, it should be posted from the beginning not towards the end of the competition. Last but not least, I believe the best way to crowdsourcing all different teams models is to lay the good ground-work/baselines from the beginning, so all participants have enough time to improvise. We all have the best will to help solve some challenging questions scientifically here, but the outcome as a publication or top-performing teams still matters to most of us I guess, so hopefully, the rules could be set straight and we as participants could just work our way out here.
Hi Patrick, Thank you for detailing your concerns. I understand your perspective on this issue. I also apologize if the title of the thread was misleading or confusing - this isn't a *new* dataset - this is one of the datasets we highlighted in the original Training Data guide here: https://www.synapse.org/#!Synapse:syn20968331/wiki/599458 - specifically, the L1000 dataset. This is the dataset used by the baseline model and by the Szalai et al paper described under additional resources here: https://www.synapse.org/#!Synapse:syn20968331/wiki/599462 We've simply provided one example of how to retrieve and aggregate this dataset that might be useful to participants. However, this doesn't take into account any of the other many publicly-available gene-expression datasets that could be used for this challenge, so we don't feel that this is a complete dataset; it's simply to give participants some introductory guidance and direction. I suspect that if you are working on a specific and custom strategy to use this or other datasets that your team is well-poised to perform better than the baseline model. Originally, we defined the scope of eligible training data to be any *publicly available* gene expression data (mentioned in the Training Data section), so in theory, anyone could use any data that is made publicly available before the end of the challenge. Also, thank you for your feedback regarding the challenge deadline. We do not currently have any plans to shift the deadline from the initially proposed deadline. Best, Robert
Dear Robert (and All), First of all, thanks for organizing this very interesting DREAM Challenge. This is the first time that my group participates, and we´re not sure of how this has worked in the past. However, my understanding is that the different DREAM Challenges should have very clear rules, and **I do not think it is fair to incorporate new data, or change the submission deadlines, once the challenge has started.** In particular, we are devoting very significant efforts to design a strategy based on the provided data, and are working against the clock to meet the submission schedule. Hope you understand our perspective. Best regards, Patrick.

New training data available page is loading…