Hi,
I am looking for a gene x sample matrix of copy number variation(CNV) called from WGS or WXS. So I can compare the CNV of some genes between primary and recurrent samples. May I know if that is something available for download? Thanks!
Created by Jingyi_Wu Hi @Jingyi_Wu,
Thank you for your interest in the resource and raising this request.
I have uploaded an aliquot x gene table (which can be converted to a gene x aliquot table) under the Files tab (current release "variants_gene_copy_number.csv.gz") and added description of this new File to the Data Dictionary.
The gene-level copy number calling approach was detailed in our 2019 paper (PMID:31748746) and I have provided that text below for convenience.
A reminder that you will have clearer signals and more confidence in CNV calls if you restrict the aliquot barcodes to those listed as "allow" in the analysis_blocklist table.
-Kevin
--------------------------
**Copy number calling**
A copy number caller loosely based on GATK ?CallCopyRatioSegments?
(which in turn is based off of ReCapSeg) and GISTIC was implemented
to call both arm-level and high-level copy number changes, respectively.
Segments (from ?ModelSegments?) with a non-log2 copy ratio
between 0.9 and 1.1 were determined to be neutral. These segments
were then weighted by length and a weighted mean and standard deviation
non-log2 copy ratio (once-filtered) were determined again. Outlier
segments are removed and once again a weighted mean and standard
deviation non-log2 copy ratio (twice-filtered) were determined. Segments
with a non-log2 copy ratio between 0.9 and 1.1 and segments
within two standard deviations of the twice-filtered mean were determined
to be neutral, and segments outside of these boundaries were
determined to have a low-level amplification or deletion, depending
on the direction.
The weighted mean and standard deviation of the non-log2 copy
ratio (once-filtered) was then determined individually for each chromosome
arm. Outlier segments were removed and the weighted mean
and standard deviation of the non-log2 copy ratio (twice-filtered) was
determined again. To determine a high-level amplification and deletion
threshold, the most highly amplified and deleted chromosome arms
were selected, respectively. The twice-filtered mean plus (high level
amplification) or minus (high level deletion) two times the standard
deviation of the selected arms were used as high-level thresholds.
Gene level copy numbers were called by intersecting the gene boundaries
with the segment intervals and by calculating the weighted nonlog2
copy ratio for that gene. The copy number call for that gene was
then determined by comparing the gene-level non-log2 copy ratio to
the previously determined thresholds.
I found the table cnv_data in figure1_3_input.RData but it only shows a few genes that are not what I am interested in. Btw, is there a way I can find the annotation for columns in cnv_data? Thanks!