Hi,
thank you again for providing the genie dataset. They are very valuable.
Upon analyzing the dataset, I've noticed that the gene data entries are represented with a spectrum of values including -1, 0, 1, and NA. My query pertains to the interpretation of the 'NA' values. In my current analysis, it seems plausible to consider these 'NA' entries as equivalent to the '0' value. However, before proceeding with this assumption, I am keen to confirm the appropriateness of this approach.
Would it be acceptable to treat the 'NA' values as zeros, or would that potentially distort the dataset's true representation?
Any guidance will be much appreciated.
Thanks!
Best,
Yating
Created by YATING CHENG ycheng87 Dear @ycheng87,
The CNA format for project GENIE is described in the [cBioPortal docs](https://docs.cbioportal.org/file-formats/#discrete-copy-number-data) with more detail [here](https://docs.cbioportal.org/user-guide/faq/#what-do-amplification-gain-deep-deletion-shallow-deletion-and--2--1-0-1-and-2-mean-in-the-copy-number-data).
The "NA" values should not be interpreted as the same as "0" in the CNA dataset. "NA" values means not available while "0" means diploid.
"""
For each gene-sample combination, a copy number level is specified:
"-2" is a deep loss, possibly a homozygous deletion
"-1" is a single-copy loss (heterozygous deletion)
"0" is diploid
"1" indicates a low-level gain
"2" is a high-level amplification.
"""
Let me know if this answers your question!
Best,
Sage Team
Drop files to upload
Inquiry Concerning the Interpretation of 'NA' Values in the CNA Dataset page is loading…