Dear organizers, It was mentioned in the webinar that the vcfs were being processed and will be updated soon. Does this apply to both the training data and the data for the express lane? For the expressed lane, what would be the path to the vcf files? Thank you!

Created by Junjie Zhu jzhu2illum
Dear Jason, I have contacted the data generator and it appears our description of how theRNA-seq data was generated is incorrect. Our collaborator used what is considered best practices for WES and RNA-seq based variants respectively given the tumor-normal tissue pairing of the WES data and the tumor-only samples for the RNA-seq data. The WES data was created using Mutect2 while the RNA-seq data was generated with haplotypeCaller. The two callers are related and but not the same and our documentation needs to reflect that. We will be updating it shortly. [Here is a description of the Mutect2 and its relationship to HaplotypeCaller](https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_cancer_m2_MuTect2.php). We will be working to update the wikis over the weekend. I apologize for the incorrect descriptions and the confusion it has caused. Thank you for bringing this to our attention. Mike
Dear Jason, I am sorry for the delay, I have reached out to the people who generated these files to provide you a more detailed response.
Hi Mike, I'd really appreciate it if we could get some information on this, because we were assuming that the vcfs for the training and validation were be processed under the same pipeline. But it seems like things are quite different. Jason
It seems like the Mutect vcfs in the training have both "normal" and "tumor", but I get an error when trying to read samples in the same way from the validation data and it seems like there is only one provided for all Mutect vcfs. Also, it appears that filters differ between the RNA-based and DNA-based features. For instance, this is what the Mutect vcf filters use ``` ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ##FILTER= ``` I wonder if the filters are set differently for the training and validation DNA-based vcf? Jason
Thank you so much for your help!
Unfortunatly we are not allowed to share any of the DFCI data used in validation cohorts that will have this format. We carried out similar processing on a handful of MMRF RNA-seq samples which are available [here](syn10517990). These may be useful for understanding the format of and size of the data. As discussed the webinar, participants will have to be careful in how they filter these larger files. The RNA-seq is made from tumor samples and lacks a paired normal sample for removing germ-line variants.
Sorry for sending multiple messages. I was going through all the data available to us for training and I couldn't find any rna-based vcf call files. Please correct me if I'm wrong. For example, in the clinical meta file nothing is provided for the field "RNASeq_mutationFileMutect". I understand that methods should be robust to being able to detect different samples, but there is almost no information about the DFCI RNA-seq based vcfs and explanation of how it could be different from the DNA-based vcfs. I tried to run my parser but the vcf did not even recognized the sample names in the RNA-seq vcfs, while it worked fine on the DNA-seq vcfs. I thought both DNA-based and RNA-based vcfs were both provided for the MMRF training set, but I believe I was making the wrong assumption about this?
Thanks a lot! I couldn't find any updates on this, so I wonder if I missed any new insights provided for the RNA-seq based vcfs. For instance, does the RNA-seq based vcfs include both tumor and normal samples for each individual? Can we expect the format for WES_mutationFileMutect and RNASeq_mutationFileMutect to be the same?
Dear Junjie, Apologies for the confusion. The vcfs are not being processed. I was referring to the DFCI RNA-seq based vcfs which need to be used carefully by participants. Their data contains many "hits" that would be filtered out when paired with that normal sample. We may provide some additional insight in the next few weeks but the vcf's will remain unchanged.

VCF file updates page is loading…