Hello,
Can you please help me finding the files used to calculate the mutation burden for each sample (#mutations/MB)? Also, are these mutations non synonymous or total mutations?
Thank you so much,
Solomon
Created by w033msa Dear Solomon,
Thanks for your interest in the GLASS dataset. These are total mutations. To calculate the mutation burden for each sample you'll need the following tables: "variants_ssm2_count" (newly added) and "analysis_coverage".
There is a SQL query on the project's GitHub that provides the code to generate the mutation burden variable and I've attached that below. It's labelled here as "coverage_adj_mut_freq".
Please let me know you if you have any additional questions.
Best,
Kevin
### Compute mutation frequencies for each aliquot_barcode
- Mutation frequencies is output in mutations per megabase (1e6 basepairs)
- Only mutations with >= 15x are counted
- Mutation counts are divided by the number of basepairs with at least 15x coverage
- COALESCE is used to prevent "divison by zero" problems
- JOIN to blocklist so we don't report any aliquots that were excluded based on fingerprinting or coverage
Note:
- for the ssm2_count table I counted mutation using greater than (>) --> 14 threshold
- for the coverage table I counted coverage using greater than or equal to (>=) --> 15 threshold
```
SELECT
m2.aliquot_barcode,
cumulative_coverage,
ssm2_call_count AS mutation_count,
COALESCE(ROUND(ssm2_call_count::numeric / cumulative_coverage::numeric * 1e6, 4), 0::numeric) AS coverage_adj_mut_freq
FROM variants.ssm2_count m2
INNER JOIN analysis.coverage cov ON cov.aliquot_barcode = m2.aliquot_barcode
WHERE m2.ad_depth = 14 AND cov.coverage = 15
```