Editing to say that we got an answer for this - thank you! Hi, We are mapping diagnosis information from the GENIE BPC cohorts using ICD-03 Topography (ca_d_site), morphology (naaccr_histology_cd), adenocarcinoma vs squamous cell carcinoma (ca_hist_adeno_squamous) and histology (ca_type) fields from the cancer_level_dataset_index table. After mapping the ICD-O3 values to OncoTree codes, we compare these to the sample table codes for the corresponding patient_id. However, we have found discrepancies where the sample table codes differ from our diagnosis mapping. For example, the diagnosis table maps to Rectal Adenocarcinoma (READ), but the sample table lists Colon Adenocarcinoma (COAD) for the same patient_id (e.g., GENIE-DFCI-009934, GENIE-DFCI-009585, GENIE-DFCI-009426, GENIE-DFCI-008737). Similarly, in GENIE-MSK-P-0005197, the diagnosis maps to SRCCR (Signet Ring Cell Adenocarcinoma) based on ICD-O3 morphology, but the sample table lists COAD. In the NSCLC cohort, GENIE-MSK-P-0025284's histology should map to LUAD but the sample table lists NSCLCPD, and GENIE-MSK-P-0023572's histology maps to LUCA but the sample table lists LUSC. Could you please explain: how the sample table OncoTree codes are assigned how the diagnosis codes/histologies are assigned and suggest ways to resolve these conflicts? Thanks very much in advance!

Created by Felicia Kuperwaser fkuperwaser
Sure thing. Here's the reply another team member received by email: The Tier1A data elements (those elements captured for all patients in the Main GENIE registry) are collected and submitted by a different team at each site from the team that curates the additional data elements found in the BPC dataset. We have observed that in a very small subset of cases, the Tier 1 data elements are discordant from the curated elements (e.g. Oncotree [Tier 1] differing from ca_type [curated]). As you will see in the BPC Analytic data guides, the variables found in the BPC datasets are color coded to represent where the values were obtained: TIER 1 Data Tumor Registry Curated Derived We?re currently implementing QA processes to alleviate these discrepancies in future releases. However, due to the extensive QA/QC processes underwriting the BPC dataset (consisting of four rounds of query resolution in addition to source document verification) we would recommend utilizing the curated variables or derived variables when discrepancies are noted between them and Tier1. Additionally, please note that certain derived variables consider multiple sources and are often the preferred variable. These variables are noted in grey in the data guide. I also reached out to a QA Manager at MSK, to provide additional context regarding your question: ?How the diagnosis codes/histologies are assigned?. She provided these additional details regarding GENIE-MSK-P-0025284: ?The field naaccr_histology_code is a tumor registry field and is assigned by the tumor registry based on their multiple primary and histology coding rules (MPH and Solid Tumor manuals). For the this patient, the tumor registry abstracted the histology as Adenocarcinoma NOS which makes sense to me based on the pathological diagnosis for the specimens. The field ca_type is a curated field that is determined based on the primary site, histology, or some combination of the two depending on the type of cancer?these are very broad groups. For example, lung cancer is categorized primarily based on histology (NSCLC vs Small Cell) while colorectal cancers are grouped by primary site (colon vs rectal vs colorectal if primary tumor overlaps sites or exact origin can?t be determined). So, these fields should generally agree with each other but there may be small discrepancies because they are coming from different sources.?
Thank you, @fkuperwaser, for your interest in the GENIE BPC dataset! It seems that your questions have already been addressed. However, if you have the time, could you please share the answers here in this thread? This would be helpful for others who might have similar questions. Thank you once again!

Clarification on OncoTree assignments page is loading…