Clarification for sc1 & sc2

I have read the threads, and looked at the updated timeline. I'd like to be sure, though, that I've understood. The difference between SC1 and SC2 is: SC1: genomic variation data (data, DNA single nucleotide variants, indels, etc) SC2: gene expression (RNA-seq or microarray based) variables SC1 says: ``` Challenge Question 1: Implement an Genomic Variant (DNA) Based Predictor of High Risk in Multiple Myeloma Identify high risk patients defined by disease progression (or death) within 18 months from time of diagnosis using genomic variation data (data, DNA single nucleotide variants, indels, etc) with the option to include ISS and age (two known strong prognostic factors) and available cytogenetics. Gene expression data may not be used as direct input for challenge question 1 and will be not be accessible in the leaderboard or validation rounds. However, expression information may be used in incorporating prior knowledge (ex. prioritizing or filtering genomic features) when constructing the genomic variant based predictor. ``` I'm not quite sure what the 'etc.' covers, but, as I understand it, the file sc1_Validation_ClinAnnotations.csv defines what is available. So, with the header: ``` "Study","Patient","D_Age","D_Gender","D_ISS","PatientType","MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId","RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile","CYTO_predicted_feature_01","CYTO_predicted_feature_02","CYTO_predicted_feature_03","CYTO_predicted_feature_04","CYTO_predicted_feature_05","CYTO_predicted_feature_06","CYTO_predicted_feature_07","CYTO_predicted_feature_08","CYTO_predicted_feature_09","CYTO_predicted_feature_10","CYTO_predicted_feature_11","CYTO_predicted_feature_12","CYTO_predicted_feature_13","CYTO_predicted_feature_14","CYTO_predicted_feature_15","CYTO_predicted_feature_16","CYTO_predicted_feature_17","CYTO_predicted_feature_18" ``` We are expected to use: ``` "D_Age","D_Gender","D_ISS",... and "CYTO_predicted_feature_01","CYTO_predicted_feature_02","CYTO_predicted_feature_03","CYTO_predicted_feature_04","CYTO_predicted_feature_05","CYTO_predicted_feature_06","CYTO_predicted_feature_07","CYTO_predicted_feature_08","CYTO_predicted_feature_09","CYTO_predicted_feature_10","CYTO_predicted_feature_11","CYTO_predicted_feature_12","CYTO_predicted_feature_13","CYTO_predicted_feature_14","CYTO_predicted_feature_15","CYTO_predicted_feature_16","CYTO_predicted_feature_17","CYTO_predicted_feature_18" ``` The external files referenced (which may, of course, not be there) are: ``` "MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId","RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile", ``` From this, I infer, we should be using: "MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId", SC2 says: ``` Challenge Question 2: Implement an Gene Expression Based Predictor of High Risk in Multiple Myeloma Identify high risk patients defined by disease progression (or death) within 18 months from time of diagnosis using gene expression (RNA-seq or microarray based) variables with the option to include ISS and age (two known strong prognostic factors) and available cytogenetics. While genomic features may not be used as direct input for challenge question 2, they may be utilized in deriving and incorporating prior knowledge (ex. prioritizing or filtering expression data) when constructing a team's algorithm. Genomic variant information will not be available in leaderboard or validation datasets ``` From the above, we should also be using: "RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile", Is that correct?

Created by Peter Brooks fustbariclation
Hi Peter, I apologize for the long delay in responding. The above isn't quite right. - All sub-challenges can use age, gender, ISS, and cytogenetics (i.e., CYTO_predicted_feature_01, CYTO_predicted_feature_02, etc.) - SC2 may include expression-based features [i.e., microarray expression (MA_probeLevelExpFile, MA_geneLevelExpFile) or RNA-seq (RNASeq_transLevelExpFile, RNASeq_geneLevelExpFile)] - SC1 may include variant-based features [i.e., those derived from DNA-seq (WES_mutationFileMutect, WES_mutationFileStrelkaIndel, WES_mutationFileStrelkaSNV) or RNA-seq (RNASeq_mutationFileMutect, RNASeq_mutationFileStrelkaIndel, RNASeq_mutationFileStrelkaSNV)] We will not be providing fusions from RNA-seq data--the column RNASeq_FusionFile will be removed from the annotation files or otherwise always NA. I hope that clarifies. Brian

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Clarification for sc1 & sc2 page is loading…