Dear Colleagues
I have a question about the overall conception behind the Challenge.
I have looked for class labels within the training data and found nothing. The column "he_b008_high_cholesterol" is removed from "healthexposure_16jun22_v3.1_nonpii_train_synthetic.RData" file.
How do you suppose we shall train our model without knowing who is "high_cholesterol" in the training data?
It may happen that I missed some file with class labels, can you provide this information and highlight it more precisely in the Wiki?
Ramil
.
Created by Ramil Nurtdinov n.ramil Hi @joshroll,
Thank you for identifying these issues.
We have identified some discrepancies with the epr_numbers in the data files and are working on fixing these issues. We will update this thread once the rectified files have been uploaded.
Thank you,
Farida Hello @gaia.andreoletti,
After not having any epr_number values align between the new "healthexposure_16jun22_v3.1_nonpii_train_synthetic.RData" file and version 5 (the latest version) of "exposomea_29jul22_v3.1_nonpii_train_synthetic.RData", I checked the version differences and found every epr_number had been replaced in version 6 of the healthexposure file along with the re-addition of the "he_b008_high_cholesterol" column but the values of the other columns are the same and in the same order. The previous set of epr_numbers had an overlap of 480 with exposomea's epr_numbers and the new one has 0.
I saw the start of a synthetic data epr_number issue discussion on another thread ([here](https://www.synapse.org/Synapse:syn52817032/discussion/threadId=11104&replyId=32373)) and both these file versions appear to have been adjusted after that.
Should we expect further changes to fix these epr_numbers, and will we need to redownload every file for this?
Thanks,
Joshua Hi @n.ramil,
Thank you for your question. You are correct - initially, we removed "he_b008_high_cholesterol" from all the datasets. Our goal is for participants to predict the high cholesterol phenotype based on the available data, rather than solely on the measure of cholesterol levels. However, after an internal discussion, we decided to re-add "he_b008_high_cholesterol" to the healthexposure_16jun22_v3.1_nonpii_train_synthetic.RData file to aid in training.
Hope this clarifies things,