Hi It would be very useful for everyone if we had dummy expression files for all studies in order to be clear what the format of the expression data will be on the validation data set. For example, in dfci.2009_entrezID_TPM_hg19.csv and m2gen_entrezID_TPM_hg19.csv are the genes in the columns and samples in rows or the opposite? Is it the same for all validation studies? how about microarray data? Having the dummy files would also let us precompute the appropiate mapping for the gene names since we won't have in internet to search for them on the docker. thanks DA

Created by exquirentibus veritatem exquirentibus
Yes, that is correct. Please see the description of the training data files here: 3.4 - Organization and Format of Training Data and Clinical Annotations https://www.synapse.org/#!Synapse:syn6187098/wiki/449441 And the validation data files here: 3.6 - Organization and Format of Validation Data and Clinical Annotations https://www.synapse.org/#!Synapse:syn6187098/wiki/449443 Please note, in particular, the tables on each of those pages have been updated to answer your question. Brian
so that means that the file dfci.2009_entrezID_TPM_hg19.csv and m2gen_entrezID_TPM_hg19.csv will have the same format as MMRF_CoMMpass_IA9_E74GTF_Salmon_Gene_TPM.txt except that they are comma separated and the gene ids are in entrez rather than ensemble form?
The format of those match that available for MMRF training RNA-seq with exception being that the delimiter switching to ","

dummy validation expression data page is loading…