As I was debugging the docker I noticed that some of the genes that I was using in my trained model are missing in several of the validation cohorts! For example, DFCI is missing gene 158055 (ENSG00000196366) and 750(ENSG00000221819). Can you verify that. I thought that you were making sure that all the genes provided in training were also available in testing. Or at least we should get a list of genes that are guaranteed to be in ALL validation datasets so we can limit our models to use those genes only. Perhaps is an issue with the mapping between ENS id and entrez id? thanks DA

Created by exquirentibus veritatem exquirentibus
Hi Walt, Yes it is acceptable to do that.
Hi Mike, Regarding gene expression data, is it okay to check the overlaps between genes for building models and genes in validation sets? E.g. Printing the overlaps in log files through expression lane submissions. Many thanks, Walt
Apologies for the delay., We have now provided a Entez id based version of the RNA-Seq expression for MMRF that might help address this. However, note that there will be missing genes in the validation set and methods should be robust to missing genes.

There are missing genes in validation! page is loading…