Dear @PEGSDREAMChallengeParticipants , Thank you all for your continued interest in the PEGS DREAM Challenge! As of Monday, July 1st, all challenge data, including synthetic and original, has been updated to address the issues noted in [this thread](https://www.synapse.org/Synapse:syn52817032/discussion/threadId=11104). Changes include some data being removed. For optimal model building, please download the latest versions of the synthetic data ([training](syn59063544), [validation](syn59063543)) if you had previously downloaded them before this date, and re-run your models as needed. If you have any questions, please reply to this thread. Happy coding! @PEGSDREAMChallengeOrganizers

Created by Verena Chung vchung
Dear @vchung and @farida , thanks a lot, it now seems to work.
@JoFa , Thank you for pointing this out! We have since corrected the files and this validation error should no longer be received (assuming your model makes exactly one prediction per sample ID). **One important note**: the expected column names of your prediction files should now be: `epr_number` and `disease_probability` The [Submission Tutorial](https://www.synapse.org/Synapse:syn52817032/wiki/627650) has been updated of the same. Thank you again for your patience as we navigate through the data issues!
Thanks @JoFa, We are looking into this issue and will update you shortly.
Dear @farida , In fact, we did not compare two files but submitted a model for scoring. Following the submission instructions, our model creates a predictions.csv file. The IDs that need to be written to the predictions.csv file are taken from the healthexposure_16jun22_v3.1_nonpii_val.RData, which is mounted in the Docker container under /input/. This should create a valid predictions.csv file that the submission system can score. However, the submission system throws the error that 50 of the approx. 3000 IDs are unknown. If we have a look at the implementation of the scoring script (https://github.com/Sage-Bionetworks-Challenges/pegs-evaluation/blob/main/validate.py), we suspect that the gold-standard file misses some IDs. If something in our explanation is unclear, we are happy to provide the missing information.
Hi @JoFa , Could you specify which 2 files you are comparing the IDs in?
Dear @vchung , thank you very much for taking care of this. We have run a minimal model on the updated data. Unfortunately, the process still throws the same error as with the "old" data: Evaluation failed for Submission. Reason: 'Found 50 unknown ID(s): ['10380801', '10848312', '11585566', ... ... ] The IDs are taken from the mounted healthexposure_16jun22_v3.1_nonpii_val.RData file. Since almost 3000 IDs seem to match and only 50 are unknown, we suspect the error lies in the evaluation. Is it maybe possible that the gold standard file has not yet been updated?

Challenge Data Changelog page is loading…