Hi I understand that the algorithms should be robust to a few missing values like missing demographics but I assume that for challenge 2 in validation all patients will have expression at least. Is this correct? I also assume that the same set of genes will be available in training and validation. The same for challenge 1? thanks DreamAnon

Created by exquirentibus veritatem exquirentibus
I have confirmed that is only the case for MMRF. Thank you for bringing this to our attention. Mike
Hi dreamAnon, are you looking in syn9926877?
It should not but I will look into this to determine what happened and ensure it will not be an issue for the validation.
I noticed that in the training data in the MMRF dataset there were some patients that had a gene expression filename in the clinical file but that gene expression file did not contain expression for that patient specifically "MMRF_1079_1_BM" "MMRF_1805_1_BM" "MMRF_1988_1_BM" "MMRF_2507_1_BM" are not in MMRF_CoMMpass_IA9_E74GTF_Salmon_Gene_TPM.txt Am I understanding correctly that this will not happen in validation?
Good. thanks for the clarification DA
Dear dreamAnon, Both challenges will have missing data in these columns.... for example: * in Challenge question 1 DFCI validation data will not have WES based mutect or ot Strelka vcfs but **will** have RNA-seq based mutect calls. No validation sample should have NA's in all three data types. * in Challenge question 2 DFCI samples will not have microarray data data column data but **will** have RNA-seq data columns filled, while the Hose dataset will have RNA-seq columns with NA and microarray columns filled.

Missingnes and expression page is loading…