Hello GENIE team, I am trying to reproduce the summary statistics shown on cBioPortal and found some discrepancies. For example, the gene panel with SEQ_ASSAY_ID 'DUKE-F1-T5A' contains over 550 genes when accessed via cBioPortal but only 245 when downloaded from Sage. Also, samples with SAMPLE_ID ending in 'triseq-v2' use a gene panel with the SEQ_ASSAY_ID 'PROV-TRISEQ-V2' . I cannot find that gene panel at all in the 'gene_panels' folder. Any advice would be greatly appreciated! Best, Stefan

Created by Stefan Semrau semrau
Hi @henrik.seidel , We will be patching this information in future public releases. The panel will be annotated as a targeted sequencing panel. NOTE: it is technically still using WXS target capture kit, but only what is clinical significant is reported to us.
Hi @henrik.seidel Thanks for all the work you've done looking at the GENIE dataset. I'll be taking this information back to the site and will let you know when we heard a response
Okay, but in that case you should add a file for this panel in gene_panels, listing the about 300 genes that reports from this assay were reduced to. Here is what I created by searching for all mutations that were reported in `data_mutations_extended.txt` for this assay: ``` stable_id: PROV-TRISEQ-V2 description: PROV-TRISEQ-V2, Number of Genes - 310 gene_list: MEN1 SMAD2 PRKDC NRAS FOXL2 SOX9 MYD88 KMT2D RET XRCC2 POLD2 CDH1 ERCC2 HDAC2 MYCN ETV1 FBXO11 MED12 FGF23 AURKB ASXL1 SETBP1 PDGFRA KEAP1 CASP8 MLH3 U2AF1 FGF7 CYLD CSF3R CREBBP FANCL FGFR3 RPTOR JAK2 MPL CDK4 CBFB BTK MAP3K1 RB1 NF1 MAP3K9 MST1R ATRX NBN IDH2 ATM IL7R GATA3 AXIN1 RBM15 PIK3CD UGT1A1 GLI3 SMC1A JAK3 MECOM NTRK3 IKBKE HGF UGT1A6 PAX5 XPO1 RPA1 DICER1 FGFR4 CHEK2 SPOP FLI1 VEGFA TGFBR2 PIK3CB FLT4 SMO WT1 MAP2K1 TOP1 TERT MTOR WHSC1 ARID5B PIK3CG TPMT ALK MSH6 TP53 FGF6 CDKN1B CD79A CEBPA GEN1 GATA1 POLD1 SRC FGFR2 MYC MUTYH NOTCH3 MSH2 FANCI EIF4A2 KMT2C AKT3 KIT PRDM1 FANCD2 TBX3 KMT2B AKT1 LCK HRAS PDCD1 CTNNB1 PPP2R1A RPA2 ARID1B MLLT3 TSHZ3 NSD1 ID3 RICTOR CRKL FANCA GLI2 BCL6 CD3EAP ACVR1B FGF4 FLCN SLX4 PICALM INPP4B SETD2 STAT3 PDCD11 IDH1 TFRC PIK3CA RUNX1 RAD21 MDM4 EGFR KDM5C TSC1 EPHA5 EPHA2 FANCG GNAQ FH SDHA BCR GLI1 SMC3 KAT6A DNMT3A CIC CCNE1 MAPK1 TRAF2 EZH2 TET2 NFE2L2 KRAS ATR ESR1 ERG MAP2K2 CCND1 TRAF3 KDR CDK12 TNFAIP3 CDKN1A NOTCH2 FLT1 CCND2 DPYD FUBP1 IRS2 KDM6A NPM1 STK11 PIK3R2 BIRC3 AR LAMP1 EXO1 PCBP1 TAL1 B2M PCNA MAP2K4 RAD54L TCF3 FBXW7 PAX7 NTRK2 NAV3 STAG2 TRAF7 BRCA1 FGF5 JAK1 RPS6KB1 MET AXIN2 SDHC PDGFRB RNF43 PTEN RFC2 NCOR1 FGF8 ROS1 NF2 MRE11A MSH3 EPHB6 FGF3 FGF10 RFC3 PBRM1 NRG1 CTNNA1 PTPN11 HDAC1 TOP2A ERBB3 TSHR TLR4 CTCF PMS1 PRSS1 ETV6 SIN3A TSHZ2 PMS2 FANCE GATA2 RAD51 EP300 CSF1R C1orf147 NUP214 FANCF SF3B1 CDKN2A AXL MDM2 SDHD CUX1 ABL1 LYN PTCH1 SOX2 LRP1B RAF1 CBL ZMYM3 MLH1 ARID1A SOCS1 BAP1 BRIP1 IGF1R TBL1XR1 AKT2 BRAF FGF9 ERCC1 DDR2 JUN FANCC CDC73 PALB2 ERBB2 MCL1 FGFR1 NTRK1 APC ZRSR2 CDKN2C CALR CCND3 FGF14 AURKA SMAD4 DDX3X TSC2 POLE VHL NOTCH1 FLT3 BCL2 GNA11 EPHA3 KMT2A ERBB4 SRSF2 FGF19 BRCA2 RHOA SUFU BCOR GNAS-AS1 PRKCI FYN PIK3R1 CDK6 ```
@henrik.seidel, We received an answer from the site: "The sequencing assay is indeed exome-level, but that is reduced to ~300 genes based on what our pathologists and oncologists believed at the time to be important for clinical review and reporting ? which is also the subset that we submit to GENIE." I will add this clarification to the release-notes in the next release(s). Let us know there is anything else. Best, Chelsea
@henrik.seidel, The site is still working on investigating this issue. We will update you as soon as it has been resolved. Best, Chelsea
Release 15.0 still reports mutations for only 310 genes for this assay, have you looked into this issue meanwhile?
@henrik.seidel you are correct the assay_information.txt was incorrect for these assays. The PROV-TRISEQ-V2 assay used a hybrid capture whole exome sequencing panel that covers 19,433 genes using the IDT xGen Exome Hybrid Panel. The PROV-TRISEQ-V1 used the first iteration of this panel that covered only 19,396 genes. I will look into why only 310 genes are reported in the data_mutations_extended.txt and attempt to rectify that. I have updated the assay information file with the correct information, sorry for the confusion/incorrect metadata!
The panel PROV-TRISEQ-V2 seems to be the only whole exome sequencing panel, according to the `library_strategy` field in the `assay_information.txt` file. However, that file also says that the `number_of_genes` is **323**. This doesn't really make sense. Either all genes are reported, then the `number_of_genes` is wrong, or this is in fact a targeted panel, in which case the `library_strategy` field would be wrong and a respective file in `gene_panels` would be required. To be more precise, the `data_mutations_extended.txt` file contains reports for 310 genes for the `PROV-TRISEQ-V2` panel, which is close to the 323 genes from the `assay_information.txt` file, while the `data_mutations_extended.txt` file has reports for 1696 genes tested by at least one assay. Therefore, most likely the `PROV-TRISEQ-V2` assay in fact only tests for 323 genes, and a respective file in the `gene_panels` directory is missing.
Dear @semrau, Thank you for your patience. We have looked into your questions. 1. Regarding the misalignment of the number of genes in the gene panels between cBioPortal and Sage, this is an issue. We are looking to resolve this issue upon the next release. To note, the files downloaded from Sage contains the correct number of genes in the gene panel. 2. PROV-TRISEQ-V2 is not a gene panel, but whole exome sequencing. This is why it is not in the gene_panels folder. Let me know if there is anything else you wish to discuss! Thank you for participating in project GENIE. Best, Chelsea
Hi @semrau, I want to let you know we are working on your question. We will try to get back to you as soon as we can after our investigation. Best, Chelsea

PROV-TRISEQ-V2 not in gene panels and discrepncies with cBioPortal page is loading…