I have downloaded the MSBB covariate data (https://www.synapse.org/#!Synapse:syn6100548), opened it on Excel, and filtered the Action column so it only shows 'Okay' values, the fileType column so it only shows 'bam' values, and the BroadmannArea column so it only shows 'BM10' values. From here, I have ordered the individualIdentifier values into ascending. I've found that a few of the individualidentifier values are repeating, such as AMPAD_MSSM_0000003284 and AMPAD_MSSM_0000007155. From my understanding each individualidentifier is meant to match a single patient, and indeed when looking at the clinical data (https://www.synapse.org/#!Synapse:syn6101474) the individualIdentifiers seem to be unique to each patient. Am I overlooking something about these repeating values? Thanks

Created by Roberto Avelar ravelarv
A follow-up question: "sampleIDs are the same with the second being 'xxx_resequenced' the files are from the same sample where the library was sequenced again". In this case, is it OK to select one of them randomly? Thank you, Ting
Hi @ravelarv - there are individuals that have more than one set of sequencing data from the same region. Where there is a different sampleID for each file, the sequencing was done on different samples. Where the sampleIDs are the same with the second being 'xxx_resequenced' the files are from the same sample where the library was sequenced again.

Same individualIdentifier for different columns in the MSBB covariate data? page is loading…