Hello organizers, Can I have more details about the task 1 and task 2, as well as their data? After looking at the data, I am confused that: 1. What is the gold standard for task 1 and task 2 data, and how do the expected outputs look like? 2. How will the methods be quantitatively evaluated? Thank you very much for your time and help.

Created by Kaiwen Deng dengkw
Hi @dengkw, this is true there is no predictive capabilities in the dummy data- it is meant to help you engineer and test new features in your model, once these are developed, you can test the effectiveness by submitting your model through our evaluation queue. This will let you test hypotheses on the RARE-X data without compromising the privacy of the data.
Hi @albrecht, I noticed in the Task2_Example notebook you said, "The results are terrible! It's because the data has been randomized". Does that mean all of the feature values are random in the dummy data? If true, I am afraid that the evaluations on the datasets with all random values may be unreliable. And it can be hard to determine the submission. Thank you very much! Best, Kaiwen
Thank you very much! That helps a lot
Hi @dengkw , To generate the processed features, you can check out this notebook file: https://www.synapse.org/#!Synapse:syn51942435 that generates the features from the dummy training data. Not all of the fields from the survey results are available in the data map, I can check with the organizers if they are able to share the descriptions.
Thank you very much for your previous help! I have another two questions about the base model docker and the task 2 data: 1. It seems that the base_model.py needs the inputs from processed features and labels (baseline/training_features.tsv, baseline/training_target.tsv, etc.) but the docker image does not contain them. 2. Some of the task 2 datasets did not show in the Data Map Excel file, such as the Diagnosis, Med_Diet. Do you know where can I find the column and value descriptions for them like those in the data map? Thank you Best, Kaiwen
@dengkw - you should have download access now (thanks to @mdsage1 )
Hi @albrecht, Thank you very much for your help! For task 2, does the docker container "base_model" under the "Docker" tab contain the full training data and submission example that you mentioned in your answer? But it seems that I do not have the download access.
Hi @dengkw , the evaluation for Task 1 will be semi-quantitative, as there is not a gold standard for phenotypes for the rare disease. We will judge the writeup of the Task 1 analysis for originality and comparisons with existing references, plus rerun any code that is shared on other withheld participant survey data to confirm that the findings extend to this cohort as well. For Task 2, both the training and gold standard will be from the full RARE-X participant data, augmented with the other data sources. The data (training features, training labels, and testing features) will be provided to your containerized model following the submission example and scored for overall accuracy in predicting the testing labels.

More details about the Task 1 and Task 2 page is loading…