I am trying to do SNV driver analysis and running into questions. When I look into IDH1 mutations called by Mutect2.0, I see several of them have the "failed" filter regardless of the subtypes. Here are the distributions:
Chr:Pos, Total variants, ssm2 fail, ssm2 true
2:209113112, 1972, 1314, 30
2:209113113, 2949, 1983, 27
I am not sure if the failed tags are also used to perform the driver analysis in which case IDH1 would also show up in the IDHwt subtype unless gene-wise coverage information is used to normalize across allele frequencies. However, I am not sure where that coverage information is present to do the analysis. It will be good to know how *exactly* the analysis was done as well as the complete list of driver genes than a few known ones.
Also, it will be good to know how the CNV driver analysis was done and a complete list of the CNV drivers than a handful presented in the tables page.
Also, it will be great to have a methods page to reproduce the analysis. In this era where non-reproducibility is being a major factor in biology and talked about everywhere, we computational scientists should make sure our folks are able to follow and do what we did. After all, ours is the most reproducible research one could imagine. Thanks!
Created by Kasthuri Kannan KSKannan We ran Mutect:
1. Using all samples from a patient (see https://github.com/TheJacksonLaboratory/GLASS/blob/master/snakemake/mutect2.smk#L138-L184)
2. Using all possible pairwise tumor-normal combinations (see https://github.com/TheJacksonLaboratory/GLASS/blob/master/snakemake/mutect2.smk#L192-L235)
The `ssm2_pass_call` was determined based on whether the variant was present in (2). Since we force-called IDH and TERT hotspot, this variable is not informative for these specific mutations.
Unfortunately we cannot share the raw unfiltered data on Synapse, since this contains identifiable germline information. Another question. In the paper, it is mentioned and I quote " Mutect2 was given matched control samples, the aforementioned panel of normals and the gnomAD germline resource as additional controls."
Does it mean that Mutect2 was also run in the paired sample mode? And in which file is this `ssm2_pass_call = {true, false}` is present? You said "How are you determining failed filters? I don't think we shared that information here on Synapse." Can you also share that data on Synapse? Thanks. That will be useful to do our own analysis. Again, many thanks for the clarification and for helping with the analysis. This greatly helps. We did run GISTIC separately on all primaries and all recurrences (see https://github.com/TheJacksonLaboratory/GLASS/blob/master/bin/gistic_run.pbs). This gave a list of focal and broad alterations. We ended up manually picking known drivers mostly independent of the GISTIC results, but the GISTIC results were somewhat informative. We did not use any other tools for copy number. We called copy number changes using the thresholds here: https://www.synapse.org/#!Synapse:syn21777210/tables/. Ok, thanks! This makes sense and helps me do the analysis.
Btw, it will be great if you could tell us how you called the driver CNVs - I mean the name of the program. Is this GISTIC as mentioned in the publication or any other tool you used for copy numbers? Thanks! Hi @KSKannan, `f` here means that there is no mutation according to single-sample Mutect2. It does not mean that the variant did not pass filters in most cases. Variants that did not pass filters were excluded, although IDH hotspots were forcefully included.
We manually called the mutation if there was a mutation present in other samples from the same patient and the sample in question had any mutant reads, like I was talking about above. We also may have manually called the mutation if there were mutant reads and other evidence that the tumor was IDH-mutant, eg. positive IDH immunostaining as per the clinical data. Thanks for the explanation, Barthel. The file 'variants_passgeno_20190327' contains the filter in the column ssm2_pass_call and IDH1 coordinates have an "f" for all most all of them. So which file are you talking about when using this filter? How are you determining failed filters? I don't think we shared that information here on Synapse.
For the driver analysis the criteria we used were requiring `ssm2_pass_call = true` in either the primary or recurrence.
See L122 here: https://github.com/TheJacksonLaboratory/GLASS/blob/master/sql/heatmap/heatmap_snv.sql#L115-L116
To determine whether the variant was shared or private we looked at mutant allele reads, any sample with `alt_count > 0` mutant reads was considered altered.
See L115-116: https://github.com/TheJacksonLaboratory/GLASS/blob/master/sql/heatmap/heatmap_snv.sql#L115-L116
All the code that was used is present on the Github and described in the Methods and Supplementary Information, hopefully that is _exact_ enough of an information source. We can help you understand parts of our code here, but we cannot help you beyond the scope of the paper. The general rule is, if what you're asking is not in the paper, methods or supplement, we did not do the analysis. Ie. we only focused on a few driver genes and do not have a full list of driver genes. For that, I would suggest you look at other publications on the topic.
Drop files to upload
Driver Genes (SNV and CNV) Analysis page is loading…