Hi,
I have a question about the number of hypermutators in your dataset. Your 2019 Nature paper showed 35 tumors with more than 10 mutations per Mb. I used the new variants_passgeno data and counted mutations in each tumor (e.g,. GLSS-DF-0013, TCGA-14-1402, and so on) and found a total of 72 have more than 10 mutations per Mb (assuming genome size is 640 Mb). Keep in mind I counted all mutations listed in the variants_passgeno file.
I wonder if the new variant datasets have more mutations than what was included in the Nature paper. Do you have a list showing mutation burden in each individual tumor?
Thanks so much and really appreciate your data!
Peng
Created by Peng Mao peng_mao Hi Peng,
Below is the SQL query that was used to generate the analysis_mut_freq table. This query references other tables available on synapse and applies filters to assess coverage-adjusted mutation frequency.
```
SELECT m2.aliquot_barcode,
cov.cumulative_coverage,
m2.ssm2_call_count AS mutation_count,
COALESCE(round(m2.ssm2_call_count::numeric / cov.cumulative_coverage::numeric * '1000000'::numeric, 4), 0::numeric) AS coverage_adj_mut_freq
FROM variants.ssm2_count m2
JOIN analysis.coverage cov ON cov.aliquot_barcode = m2.aliquot_barcode
WHERE m2.ad_depth = 14 AND cov.coverage = 15
```
Fred Hi Fred,
Thanks so much for your help! It is very helpful!
I checked my mutation counts using the file variants_passgeno_20220531.csv.gz (Note: only lines with the flag t in the last column are included), but found my counts are not the same as yours in your analysis_mut_freq table. Here are a few examples:
aliquot_barcode2 mutation_count(yours) mutation_count(mine)
GLSS-HF-2869-R3-01D-WXS-07LI3A 41888 43749
GLSS-DK-0008-R1-01D-WXS-DDD4B8 37092 32818
GLSS-HF-3081-R2-01D-WXS-4LBL2G 19242 17514
GLSS-SF-0024-R2-01D-WXS-LJCU73 20464 14794
GLSS-CU-R007-R1-01D-WXS-Y6X0JQ 12329 13017
GLSS-CU-R010-R1-01D-WXS-LC99UJ 8941 9619
The trend looks quite similar, but the absolute numbers are different. I'm confused why my analysis is not the same as yours. If you or anybody else can offer some insights, that'll be super helpful!
Best regards,
Peng
Hi Peng,
Your numbers are more or less correct. However, all analyses in the 2019 and 2022 GLASS papers are performed using a curated set of samples that pass QC thresholds for DNA and RNA. The 35 tumors with 10 mutations per Mb in the 2019 paper are part of this curated set, and in the 2022 paper this number increased to 41. The curated set used for DNA analyses can be found in the analysis_gold_set table. To facilitate mutation burden analyses, I have uploaded the mutations per Mb for all GLASS samples. This can be found in the analysis_mut_freq table in the Tables section.
I hope this helps!
Fred