Hi, First of all, I am not very well famliarized with this type of clinical data, so this might be a naive question. My question is about the pathology_report_level_dataset file regarding the CBP CRC project. I saw that for a single "Surgical pathology" event there might be more than one specimen in the Pathology report (in the column "path_num_spec"). Moreover, these specimens might come from different Anatomic sites (according to the path_site_x columns). However, for those cases in which Sample Aquisition took place, there is only one sample that went through sequencing (if I understood it correctly). So, is there a way to know from which anatomic site did the sample that went through sequencing came from? A similar question came up when lloking at the pathological tests in the same file (e.g MMR and MSI tests). Is there a way to know from which anatomic site did the sample that went through these pathological tests (e.g MMR and MSI tests) came from? Basically, I am interested to know if the sequenced data comes from the exact same sample that went through MSI or MMR test. Thank you very much in advance. Best, Diego.

Created by Diego García López diegogl
Dear Alex, Thank you very much for such a detailed response! At the end I managed to do a workaround that seemed to work well, but I will give this a try. Thank you very much for your help. Diego.
Hello, Thanks for your question, and sorry for the long delay getting back to you. I'm Alex Paynter, I'm a statistician with Sage who works with the GENIE data. First let me deliver the official answer: There is no general way to link a pathology location (within a report) to a specific cancer panel test (CPT). We can only get as specific as which report goes with which cancer panel test. However, depending on the research question you're interested in, I have a suggestion. About half (~3500/7000) of the pathology reports only have one site in the report (`path_num_spec`). In these cases, we **can** say which site the CPT came from. Of those, roughly 472 can be linked to a CPT report. If your research question could be answered with this subset (pathology reports that only discussed one site) then I think you could do this linkage. Once you have the CPT id, the link to MAF/CNA/SV data is easy, and 472 samples isn't nothing. I personally do not know if reports with one sample are a biased subset, and maybe there's a complication I'm missing, but that's possibly a solution to the problem. Below is an R script that does the linkage and filtering I described, just keeping the MMR/MSI and site information from the path reports. Keep us posted - we're always excited to hear about research findings or limitations that prevent them. ``` library(tidyverse) library(here) library(magrittr) path <- readr::read_csv( here('data-raw', 'CRC', 'pathology_report_level_dataset.csv') ) cpt <- readr::read_csv( here('data-raw', 'CRC', 'cancer_panel_test_level_dataset.csv') ) ca_ind <- readr::read_csv( here('data-raw', 'CRC', 'cancer_level_dataset_index.csv') ) # Find the pathology reports that only have one sample associated. path_one_samp <- path %>% filter(path_num_spec %in% 1) # Quoting from the data guide: # The Cancer Panel Test dataset can be linked to ... # PRISSMM Pathology dataset using [cohort], [record_id], # [path_proc_number] and [path_report_number] shared_key <- c('cohort', 'record_id', 'path_proc_number', 'path_rep_number') path_one_samp %<>% # limit to columns user specified an interest in: select(all_of(shared_key), path_site1, # should be filled for all solo sample reports matches("^msi_.*"), matches("^mmr_.*")) linked_data <- inner_join( select(cpt, all_of(shared_key), cpt_genie_sample_id), path_one_samp, by = shared_key ) # At this point the cpt_genie_sample_id could be used to link with MAF/CNA/SV #. data if further analysis on the genomics is of interest. ```

How can I know which was the pathological specimen sequenced/tested? page is loading…