The validation data description is confusing and I think it might be very difficult to explain clearly what's going on.   It will be helpful to release masked sc1, 2, 3_Validation_ClinAnnotations.csv files for each sub-challenge with fields being filled in with Yes and No, plus cohort names. These tables will be self-explanatory for tons of questions such as which cohort (or all) is used in which validation, which columns are actually accessible/available. if SC3 is not ready, SC1 and SC2 can first be released.   Thank you.   Yuanfang Guan

Created by Yuanfang Guan ???? yuanfang.guan
thanks. that is helpful
We have address this issue by providing simulated validation annotation files [here](https://www.synapse.org/#!Synapse:syn7222257). The format of these files and the file names and extensions they point matches those found in the true validation clinical annotation files. Please read the file descriptions in the folder linked above.
Dear Professor Guan, We are considering related options for the leaderboard rounds and final validation. However, some of our data providers have tight restrictions on releasing the clinical data of their samples. We will update the wiki and this thread once we determine how we will move forward. Thank you for your patience, Mike
This is my first dream challenge, but I'll go out on a limb and say the documentation overall could be clearer, related to technicals and the challenge itself!
For example, for the first challenge we are supposed to use DNA based features like SNV, deletions, ins/amp, fusion, translocations. From the description it seems that some of this features like translocation etc have been curated and included with blinded names like "CYTO_predicted_feature_05" and that we will get the same in validation which is great. However, in the training data for MMRF you also provide CNA with feature names like SeqExome_Cp_11p15. Are we getting a file with CNA in validation with the same names? How about the SNVs. Are we getting vcf files in validation? Regarding the gene expression data will the validation files have ENSEMBLE ids or gene symbols? counts or TPM? Will the expression table have exactly the same format as MMRF_CoMMpass_IA9_E74GTF_Salmon_Gene_TPM.txt? what about the microarray? thanks DA
I agree. It is not clear at this point what the validation data is going to look like. We need some sort of template tables (with dummy data) showing exactly what variables will be available at the validation stage to the algorithm for each of the subchallenges. thanks DA

Validation set tables page is loading…