Hello @GENIEDataNotices!
You can find the new 6.2-public release [here](syn20333031). More information about the data can be found in the [data guide](syn20444797). The latest public release can also be found on [cbioportal](http://genie.cbioportal.org/).
Edit: 6.0 was patched:
* UHN-555-* SEQ_ASSAY_ID samples were removed.
Edit: 6.1 was patched:
* Removed PHS-TRISEQ-V1, COLU-CCCP-V1, UCSF-NIMV4, JHU-500STP
* Removed VICC, UHN duplicated variants
Thanks!
GENIE
Created by Thomas Yu thomas.yu Dear @Emre_Kocakavuk ,
Thanks for your comment; we are aware of the value of expanded clinical data for GENIE. For questions about future data additions, please contact AACR, the project's sponsor and organizer: https://www.aacr.org/RESEARCH/RESEARCH/PAGES/AACR-PROJECT-GENIE-FAQ.ASPX. Dear Tom,
thanks for clarification!
Any plans for future releases to include further clinical data (overall survival, progression free survival)? This would massively increase the value of the dataset.
Best,
Emre Dear Emre,
To clarify, every single sample was sequenced in GENIE, therefore, the samples that have 0 variants in the mutation file are **sequenced but have no mutations.**.
As for the DFCI seg/CNA samples, this was what was provided to us by DFCI.
Best,
Tom
Hi Tom,
thanks for the response. in the `data_mutations_extended.txt` file there are still samples with 0 mutations, so samples that were sequenced but have no mutations are technically included.
I assume that there still is an important discrepancy between the two numbers mentioned above.
Another point: It seems that for the DFCI (glioma) samples there are no seg-files available whereas they are included in `data_CNA.txt`.
Could you please share the full data for cn?
Best,
Emre Dear Emre,
Thanks for your interest with the GENIE data. It is GENIE convention that all samples with no variants in the mutation file mean that those samples were sequenced but have no mutations.
Best,
Tom Hi Thomas,
it seems that there is a mismatch between the datasets at synapse and cbioportal.
On cBioPortal it is indicated that 70,679 samples with mutation data are available. Whereas using `data_mutations_extended.txt` from synapse I count 64217 samples with mutation data.
Approximately 10% of the data seem to be missing on synapse. Could you please give some information on this?
Thanks,
Emre