We are pleased that so many were interested in attending today's webinar, and our sincerest apologies to those that were unable to attend due to space issues. We have recorded the webinar and will be posting it shortly. Appended are questions and answers from the exchange today: ###Questions about the Data **No pre-exposure for DEE1 RSV? But there is gene expression at -24 hours?** * **A:** There is pre-exposure data for DEE1 RSV (0 and -24 hours). **There are 2 subjects with 2 replicates in Rhinovirus Duke dataset? (HRV10-022, HRV10-020)** * **A:** Yes, HRV10-020 has two 24 hours samples (2536_117268_hg-u133a_2_31225_du10-04s16618.cel and 2536_125645_HG-U133A_2_DU10-04S16619.CEL) and HRV10-022 has two 12 hours samples (2536_117337_HG-U133A_2_31422_DU10-04S16061.CEL and 2536_125646_HG-U133A_2_DU10-04S16062.CEL). **Is time 0 just before or just after exposure? Or is the time up to 0 the incubation time?** * **A:** It is just before exposure. **Is the RMA file the average gene expression of the samples? Or do we need to perform the background corrections and normalization on the CEL files? I am assuming these experiments are microarrays and not RNAseq (Cody Glickman)** * **A:** All the challenges were performed using Affymetrix arrays. RMA data has been normalized using the ?rma? function in the ?affy? Bioconductor package (background correction, quantile normalization, probe summarization and log2 transformed). **Do you have human genetics data in addition to gene expression? Do you think human genetics is important in determining onset of infection? (Ankur Dhanik)** * **A:** No, subjects were not consented for genotyping. That does not mean it isn?t important, but it isn?t addressed in this challenge. **There seem to be some data discrepancies between the data given and the data from the papers. E.g., Subject RSV001 in clinical annotations table is noted as showing viral shedding (SC2=1), but in Supp. Info. of the Cell & Microbe paper by Zaas et al., the authors state that RSV001 didn't show viral shedding. Can we get some clarity on this? Thanks for putting the challenge on! (Felix Wu)** * **A:** The algorithm for adjudicating subjects as symptomatic or shedding has been refined since the original Cell publication so small discrepancies are possible, particularly on borderline cases. **From the papers, it seems that viral shedding was tracked throughout the post-challenge timeline. How was the binary indicator of viral shedding drawn from this data? Simply presence/absence at any time post-challenge? (Felix Wu)** * **A:** The binary indicator is an aggregate of several viral shedding measurements (titer). The rule is at least 2 measurable titers. Subjects were tested for viral shedding at least once a day for at least five days after inoculation. **Except the gene expression data and the three outcomes, will information about clinical outcomes, such as viral load or other clinical outcomes relevant for the infection be available? (Ziv Shkedy)** * **A:** No information other than the one already listed will be provided. We have provided some granular symptom data for the training data, only, but these cannot be provided for the test data. **Did the administration of the anti-viral meds in two studies not interfere with gene expression patterns? (Shefali Sabhara)** * **A:** It is to be expected that anti-viral medication could alter gene expression patterns. **Can how training and testing data was split be shared? Was it stratified by class proportions, or purely random? (Sanders Lin)** * **A:** Test data were randomly selected within the 3 qualifying studies (studies not already in the public domain). **About the RMA, did you remove the validation set before performing RMA or you separate them after RMA? Additionally did you do the RMA on all CEL files or study by study? (Schrodingers cat)** * **A:** All data were normalized together. **Some were given meds throughout the study (Shefali Sabhara)** * **A:** Yes, all influenza-exposed patients were given at day 5 unless indicated that they were given early treatment at 24 hours (DEE5 only). ###Questions about Test Data and Modeling **Are Virus-Specific models allowed?** * **A:** Yes. **Starting from phase 2, do we ONLY need to predict for the new timepoints of for all available timepoints again, using the new data?** * **A:** You may use all available gene expression data for predictions. Outcomes data do not change from timepoint-to-timepoint. **Will study name (virus name) be available as part of test data set made available during final evaluation phase? (Prasad Chodavarapu)** * **A:** The study name corresponds to the virus, and will be provided in the test data. **Would it be information provided about which virus each test patient will be exposed for? If so, would it be ok to use different models and predictors for each virus? (Mika Gustafsson)** * **A:** Study name corresponds to virus, and will be provided for the test data subjects. You may adjust (or not) for virus in any way you see fit, as long as you provide predictions for all subject in the test set. ###Questions about Submissions, Submission Requirements and Rules **Could you address the submission requirement of disclosure of source code and related IP issues? (mario Lauria)** * **A:** DREAM operates as a community-based evaluation of methodologies. This requires reproducibility and transparency in assessments and, as such, source code must be made available in order to be eligible to win. Participants interested in submitting predictions based on proprietary code may do so and will be scored but are not eligible to win the incentives. **Do we need writeups for each challenge phase or only for the end? (Felix Wu)** * **A:** Write ups are required for each challenge phase. A single write-up may be used, but must be appended at each phase to reflect differences in methodology and results. **In the submission, for the list of predictors, do you require the predictors to be the genes? Can we submit the predictors as a combination of a lot of genes with weights included? (Qian Li)** * **A:** If you create predictors that are combinations of genes, we still need the list of genes included, rank-ordered and/or their importance. If you?d like to include information about aggregate predictors, please provide as additional columns in your predictor files. **When the team submit the reproducible code and the result, why they should provide detailed writeup and selected features? It is redundant and also for teams that don't perform good enough it is unnecessary. Based on previous DREAM (9.5) challenge it might be the best idea to collect these info only for winning teams (top 20 for example) (Scrodingers cat)** * **A:** A detailed write-up is necessary for the organizers and community to understand the methods used, which is not trivial to extract from source code. Additionally, we also hope to do an analysis of the commonalities and differences of predictors used across teams. Historically, it is difficult to obtain information from participants after a challenge is closed, so the only way to ensure teams participate is to ask that it be provided as a challenge requirement. Given that this particular item is a natural consequence of building predictive models, we don?t feel it is an undue burden to participants. If you feel you do not want to complete each of the challenge requirements as outlined, you may still submit predictions, however you will be ineligible to compete or win any of the prizes. **How many submissions per round, per challenge will we have? (Olexandr Isayev)** * **A:** Only one submission per phase per subchallenge will be scored. **Can a team submit multiple times before the deadline? (Jyoti Shankar)** * **A:** You may submit as many times as you need to prior to the deadline, but only the final submission prior to deadline will be scored. **Will the final submissions be the ones which are evaluated? (Jyoti Shankar)** * **A:** Only the final submission per phase will be evaluated. **Is it proper to share and discuss candidate submissions on the forum even while the challenge is open? (Prasad Chodavarpu)** * **A:** There are no rules against discussing your models or candidate submissions if you choose. **Which accuracy measure(s) will be used to assess the performance of (or rank) the methods? (Zafer Aydin)** * **A:** Scoring algorithms are detailed on the challenge website: https://www.synapse.org/#!Synapse:syn5647810/wiki/399118 **After submitting to the scoring queue, can be get scoring result rather than merely scorable or not? (Chengzhe Tian)** * **A:** No. We are unable to provide feedback that would alter the course of the challenge. After the close of each phase, some minimal information about ranks and statistical significance will be provided. **How long should we wait between each sumit? (Sheng Ting Shen)** * **A:** There is not a limit. **What is the final score that you are expecting , is it greater than a certain threshold, eg: greater than 0.95? (Mohammad Rahman)** * **A:** Submissions are expected to work better than the internal benchmark model (the baseline). ###Questions about Synapse and Certification **How does one request profile validation if one is already certified? The user page has a purple notification for this but it is not entirely clear how one requests validation. Additionally, does the purple notification go away when one is validated? (Jyoti Shankar)** * **A:** You need to be certified but not validated in order to submit to this challenge. If you are interested in becoming validated - you can manage this process from your home page. Please post questions to the forum if you continue to have questions. **What is the difference between certification and validation? (Jyoti Shankar)** * **A:** Validation is required to access some data on Synapse, but is not a requirement for this challenge. Certification is required to upload data to Synapse for any purpose, including this challenge. **Is becoming a "certified Synapse user" the same as having signed up for the Dream Challenge and joining a team? ( Felix Wu)** * **A:** No. Becoming a certified user requires you to take a quiz on how to responsibly handle human data. This is required because the Synapse system allows all users to upload content including human data and, therefore, all individuals interested in uploading content must first pass this quiz. **Is there any preparation needed to take the synapse certification exam? ( Prasad Chodavarapu)** * **A:** There is a tutorial describing the process: https://www.synapse.org/#!Help:GettingStarted ###General Questions **Can a person participate in two teams? ( Sandeep Kumar)** * **A:** Participants can only participate on one team at a time - but are welcome to switch teams within the challenge. **Except the publication planned by the challenge, are data publishable ? i.e. can be used for other publication ? (Ziv Shkedy)** * **A:** Use of the data in secondary publications is encouraged but embargoed until the challenge publication is accepted for publication. **How to apply to join a team ?** * **A:** If you have a group of colleagues you?d like to compete with, you may register as a team for the challenge. If you?d like to join an existing team or find individuals who would like to form a team, I suggest posting on the discussion board. You may also compete as an individual if you wish.

Created by Solveig Sieberts sieberts
to be frank, if no real-name system is enforced in this challenge, since the test set is too small. it is possible that you will see (close-to) perfect predictions in the second round. it won't from me, because i and anyone from my group are always real-named. but it will from someone who has nothing to lose. all of us are taking federal money. for what reason, we cannot use real identities, if it is not up to something fishy? i respect your decision, you may have other considerations that i cannot understand. but i hope you can give a second thought on this. you can also turn off feedbacks completely. that could solve all problems.
If done in order to deliberately game the system, yes. In general, it is not against the rules to switch teams.
that makes sense. but would switching teams/joining teams be considered cheating?
Here is the relevant portion from the DREAM rules: > a. Cheating by, for example, misleading other participants with decoy submissions, or using fake identities to make more submissions than allowed, is tantamount to scientific fraud. Sage Bionetworks and DREAM take incidences of cheating extremely seriously. In instances where Sage Bionetworks and DREAM confirm clear cases of cheating, the individual/team in question will be disqualified, and depending upon the gravity of the particular cheating incident, may be subject to any or all of the following: (1) individual/team banned from participation in future DREAM Challenges; (2) individual/team banned from future access to Synapse, and (3) individual/team?s cheating reported to all related and/or relevant parties, including, for example, publishers, funding agencies, University authorities, administrators, and colleagues.
The DREAM policy is to be as inclusive as possible and to encourage collaboration. Thus, we believe the current rules are the best way to support these values. Of course, the practices you describe are deliberate cheating and gaming of the system, and are not condoned under challenge rules. Participants engaging in behavior of this type will be disqualified.
solly, thanks for posting the q & a. i would appreciate if the organizing group could discuss the following point: i think it might be a bad idea to allow team joining and switching between rounds. of course, it is a huge improvement that the scores are turned off, because on this dataset a prediction file and the scores are 1-to-1 match in over 99% of the cases. frankly, the first thing i did when i read the description was to print out the matching table between scores and prediction rankings, which wasn't that long, ~8 million. however, if i have a group of 5 people, i don't need the feedback score at all, i only need the relevant rank, the dummiest method can figure out the whole 23-set in a little over 2 rounds. i don't think there will be any signal in the first 3 rounds. then the ranking will purely be determined by the 4th round. probably many people are going to use the first 3 rounds to figure out as many gold standard as possible. then the times of trials must be the same for every team. for the same reason, i think for this particular challenge it is necessary to use a REAL-NAME system (or at least the PI's name should be open to enter the challenge), and no joining and switching teams between rounds. thanks for your patience with me.

Webinar Q & A page is loading…