Below are the questions from the Webinar from 5/22/19
**Q: What would be a good/useful performance level for this model in terms of RMSE?**
**A:** The previous Science paper reported an RMSE around 4, but there is no prior expectation of performance and the top performer will be identified using the scoring procedure outlined in the challenge documentation
**Q: Thanks Adi, did I hear you correctly that no information on subject ids will be available?**
**A:** We will not supply patient ids because we would like teams to develop methods to work for one sample at a time.
**Q: Sound like you aware of batch effects in your data? If so why did you provide RMA gene expression data without correction? Is there a reason for that?**
**A:** The data are batch corrected, using the removeBatchEffect() function in limma.
**Q: In regards usage of third party data, I assume it is acceptable to use publicly available data e.g. tissue expression panels, compendium of gene sets, etc?**
**A:** yes, whatever public data you can use is fine, but you are not allowed to use outside expression data
**Q: Are we allowed to use different frameworks like tensorflow, keras for the task?**
**A:** You can use any workflow you want
Created by James Costello james.costello Hi @kirbat, you are right. Ngo et al used time from sample to delivery as opposed to gestational age (GA) at blood draw. However, the average GA at delivery in the term cohort in Ngo et al was 40 weeks. So time from sample to delivery is roughly 40-GA. The difficulty in predicting time to delivery in a term cohort is therefore expected to be about the same as predicting GA. Again, given the multiple differences between the studies, the RMSE=4.3 weeks reported in Ngo et al is a wild guess. For instance, they used less noisy RT-qPCR data instead of microarrays, not to mention the important difference that they used only normal pregnancies in that model while herein, samples from normal pregnancy are just a fraction of dataset.
Regarding the first question about RMSE: please correct me if I'm wrong, but in the Science paper they predicted time to delivery rather than GA. From supplementary materials: "Models were trained to predict time to delivery, an objective criterion independent of ultrasound-estimated GA, defined as the difference between the GA at sample collection and GA at delivery".