Hi, I am looking for BRAF missense mutation and fusion data across all tumor types. I don't see it in released 9.0 public data, would you please help me to locate it if there is? Thank you!
Created by marissacq Thank you in advance Tom!
Best,
Marissa Hi @marissacq,
In the GENIE dataset, it appears that the missense mutations are SNP, DNP, ONP, SNV. Unfortunately I don't know the answers to questions 2-4, but can try to find out.
Best,
Tom Hi Tom,
Thanks for the clarification. I have few questions regarding "Variant_classification" in mutation data:
1. do missense mutations include SNV, DNV and TNV only?
2. do in_frame_indels include all size of insertions and deletions, e.g from 1 nucleotide indel to 100 nucleotides indels? or is there a cut-off of indel size for calling in_frame_indels?
3. how about frame_shift_indel, same calling criteria as in_frame_indels regarding the indel nucleotide size?
4. In released 9.0 data guide, there is small_indels calling in alteration type, what is the definition of small_indels?
Thank you!
Best,
Marissa Hi @marissacq,
Those are the correct files to find the mutation and fusion data. I want to say that most mutations / fusions / samples from previous released data are included, however these are some of the things that could potentially cause differences in the mutation data. (These are some of the main things, but it may not include all the reasons)
- Genomic Artifacts identified and removed by sites
- Improved germline filtering
- Retraction of patients/samples
- Switching annotation pipeline (vcf2maf -> Genome Nexus)
Best,
Tom Hi Tom,
Thanks for the reply. I found BRAF mutations in data_mutations_extented.txt and fusions in data_fusions.txt in each released folder. Just to confirm with you, are these two files the right files to find BRAF mutation and fusion data? Also, does released 9.0 include all the mutations and fusions from all previous released data? Thank you!
Best,
Marissa Hi @marissacq ,
Thanks for your interest in the GENIE data. There is not guaranteed BRAF missense mutation and fusion data across all tumor types. To help me understand better, are you saying that there are some mutations / fusions that existed in prior releases that no longer exist anymore? Every GENIE release, the centers involved become better and better at removing genomic artifacts and our germline filter was updated this release to utilize gnomAD allele frequencies. Please let me know if you have any questions - it is not a known issue that the dataset is missing BRAF mutations / fusions.
Best,
Tom