I have read the threads, and looked at the updated timeline. I'd like to be sure, though, that I've understood. The difference between SC1 and SC2 is: **SC1: genomic variation data (data, DNA single nucleotide variants, indels, etc) ** **SC2: gene expression (RNA-seq or microarray based) variables** SC1 says: ``` Challenge Question 1: Implement an Genomic Variant (DNA) Based Predictor of High Risk in Multiple Myeloma Identify high risk patients defined by disease progression (or death) within 18 months from time of diagnosis using **genomic variation data (data, DNA single nucleotide variants, indels, etc) **with the option to include **ISS and age** (two known strong prognostic factors) and **available cytogenetics**. Gene expression data may not be used as direct input for challenge question 1 and will be not be accessible in the leaderboard or validation rounds. However, expression information may be used in incorporating prior knowledge (ex. prioritizing or filtering genomic features) when constructing the genomic variant based predictor. ``` I'm not quite sure what the 'etc.' covers, but, as I understand it, the file sc1_Validation_ClinAnnotations.csv defines what is available. So, with the header: ``` "Study","Patient","D_Age","D_Gender","D_ISS","PatientType","MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId","RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile","CYTO_predicted_feature_01","CYTO_predicted_feature_02","CYTO_predicted_feature_03","CYTO_predicted_feature_04","CYTO_predicted_feature_05","CYTO_predicted_feature_06","CYTO_predicted_feature_07","CYTO_predicted_feature_08","CYTO_predicted_feature_09","CYTO_predicted_feature_10","CYTO_predicted_feature_11","CYTO_predicted_feature_12","CYTO_predicted_feature_13","CYTO_predicted_feature_14","CYTO_predicted_feature_15","CYTO_predicted_feature_16","CYTO_predicted_feature_17","CYTO_predicted_feature_18" ``` We are expected to use: ``` "D_Age","D_Gender","D_ISS",... and "CYTO_predicted_feature_01","CYTO_predicted_feature_02","CYTO_predicted_feature_03","CYTO_predicted_feature_04","CYTO_predicted_feature_05","CYTO_predicted_feature_06","CYTO_predicted_feature_07","CYTO_predicted_feature_08","CYTO_predicted_feature_09","CYTO_predicted_feature_10","CYTO_predicted_feature_11","CYTO_predicted_feature_12","CYTO_predicted_feature_13","CYTO_predicted_feature_14","CYTO_predicted_feature_15","CYTO_predicted_feature_16","CYTO_predicted_feature_17","CYTO_predicted_feature_18" ``` The external files referenced (which may, of course, not be there) are: ``` "MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId","RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile", ``` From this, I infer, we should be using: "MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId", SC2 says: ``` Challenge Question 2: Implement an Gene Expression Based Predictor of High Risk in Multiple Myeloma Identify high risk patients defined by disease progression (or death) within 18 months from time of diagnosis using **gene expression (RNA-seq or microarray based) variables** with the option to include **ISS and age** (two known strong prognostic factors) and **available cytogenetics**. While genomic features may not be used as direct input for challenge question 2, they may be utilized in deriving and incorporating prior knowledge (ex. prioritizing or filtering expression data) when constructing a team's algorithm. Genomic variant information will not be available in leaderboard or validation datasets ``` From the above, we should also be using: "RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile", Is that correct?

Created by Peter Brooks fustbariclation
Hi Peter, I apologize for the long delay in responding. The above isn't quite right. - All sub-challenges can use age, gender, ISS, and cytogenetics (i.e., CYTO_predicted_feature_01, CYTO_predicted_feature_02, etc.) - SC2 may include expression-based features [i.e., microarray expression (MA_probeLevelExpFile, MA_geneLevelExpFile) or RNA-seq (RNASeq_transLevelExpFile, RNASeq_geneLevelExpFile)] - SC1 may include variant-based features [i.e., those derived from DNA-seq (WES_mutationFileMutect, WES_mutationFileStrelkaIndel, WES_mutationFileStrelkaSNV) or RNA-seq (RNASeq_mutationFileMutect, RNASeq_mutationFileStrelkaIndel, RNASeq_mutationFileStrelkaSNV)] We will not be providing fusions from RNA-seq data--the column RNASeq_FusionFile will be removed from the annotation files or otherwise always NA. I hope that clarifies. Brian

Clarification for sc1 & sc2 page is loading…