Hi,
Thanks for your great work.
Within the table of analysis_coverage the "coverage" column is one increment at a time. If it were coverage I would expect different values there. In addition, how can I know to what loci each coverage stands for? What does the high_quality_coverage_count and cumulative_coverage columns stand for? And if they are a general value, why are they different between the same aliquot_barcodes?
The same question applies to the variants_ssm table.
I think the depth column does not represent the depth but the increment in each sample. Also how can I know for which loci does each line stand for?
Thank you in advance for your help,
Kerem
Created by Kerem Wainer Katsir keremw Thank you very much! Your information really helps a lot! _NB:_
the `gene_coverage` that was uploaded is already adjusted for gene size, eg. it is the result of the following transformation:
```
SELECT gc.aliquot_barcode,
gm.gene_symbol,
gc.gene_coverage::double precision / eg.gene_size::double precision AS gene_cov
FROM analysis.gencode_coverage gc
JOIN ref.ensembl_genes eg ON eg.ensembl_gene_id::text = gc.ensembl_gene_id::text
JOIN ref.ensembl_gene_mapping gm ON gm.ensembl_gene_id::text = gc.ensembl_gene_id::text
JOIN biospecimen.aliquots al ON al.aliquot_barcode = gc.aliquot_barcode
JOIN biospecimen.samples sa ON sa.sample_barcode = al.sample_barcode
WHERE eg.gene_size > 0 AND gm.gene_symbol IS NOT NULL;
``` > Does it mean that this table will not be uploaded?
OK so I was able to upload the coverage table despite its size. It can now be found under `Tables`. It may take some more time to finish the upload.
> Is it possible maybe to only upload the coverage of the TERT promoter as it was assessed specifically in the paper and seems to be the most relevant non-coding driver area in the cancer genome?
The TERT promoter is different because it's outside the gene body. The gene coverage table calculates the average coverage across all coding exons of a gene that are covered by all Exome capture kits. For coverage at the TERT promoter sites (C228, C250) one can simply sum the `ad_ref` and `ad_alt` counts. This is also what we did in the figure, see the code here: https://github.com/TheJacksonLaboratory/GLASS/blob/master/sql/heatmap/heatmap_snv.sql#L67-L103. For all TERT and IDH hotspot variants the read counts at these sites are included in `variants_passgeno`.
Floris Hi @FPBarthel ,
Thank you for your answer.
Does it mean that this table will not be uploaded? Is it possible maybe to only upload the coverage of the TERT promoter as it was assessed specifically in the paper and seems to be the most relevant non-coding driver area in the cancer genome?
Thank you in advance,
Kerem Hi @keremw ,
The data dictionary can be found at https://www.synapse.org/#!Synapse:syn17038081/wiki/585717
The table with coverage per gene is a bit large unfortunately since it contains 40k genes x 800+ samples worth of rows.
Floris Hi Floris,
Thank you for your answer. It does clarify the values.
Where is this data dictionary you are referring to?
Also, what would be the coverage per loci, or per gene?
Thank you very much,
Kerem
Hi Kerem,
Thanks for your interest!
From the data dictionary:
`aliquot_barcode`: A variable that stitches subject, sample type, portion, sequencing type, and a unique identifier together.
`coverage`: The coverage threshold for which is represented
`high_quality_coverage_count`: The number of bases covered at a given coverage threshold for a given aliquot
`cumulative_coverage`: The cumulative number of bases covered at a given threshold and aliquot
Each aliquot has a value for `high_quality_coverage_count` and `cumulative_coverage` for coverage thresholds from 0 to 250.
Eg.
`coverage`: 10
`high_quality_coverage_count`: 4257217
`cumulative_coverage`: 96144101
for a given sample indicates that 4257217 bases have exactly 10x coverage and that 96144101 bases have at least 10x coverage. These are relative to the genome size (~3.3bln)
For the `variants_ssm2_count` the column `ssm2_call_count` indicates cumulative mutation counts. So eg.
`ad_depth`: 10
`ssm2_call_count`: 25
Indicates that 25 called mutations are covered by at least 10x
Floris
Drop files to upload
Clarification of values in the analysis_coverage table. page is loading…