Hi, I am exploring your single-nuclei RNAseq dataset in 465 ROSMAP samples (syn31512863). I tried to match barcodes to individuals using the CellBarcode and individualID columns in the ROSMAP_biospecimen_metadata.csv file (syn21323366). However, in a first batch I tried, I find that for each sample, there are barcodes that I cannot retrieve when I try to link them to the correct individual. I load A and B of the same library batch, and check all the barcodes, which gives me an amount of barcodes much greater than the amount remaining in the matrix, but when I check, still not all barcodes in the matrix occur in the barcode list extracted from the ROSMAP_biospecimen data. Could you maybe clarify why this is? Or help me fix my mistake? Thanks in advance for your help and for the great resource you created! Best wishes, Tijs

Created by Tijs Watzeels TijsWatzeels
Hi @masashi, have you had a chance to update the data files? I'm having a similar problem. When I try to join the metadata file (ROSMAP_snRNAseq_demultiplexed_ID_mapping.csv) with AnnData.obs using the cellbarcodes I only get 423825 overlapping cells between the two dataframes. I tried matching the barcode and batch but I got zero matches. Please see my code below: ``` meta = adata.obs #IDmapping is ROSMAP_snRNAseq_demultiplexed_ID_mapping.csv #bc is cellbarcode temp_map = pd.merge(meta, IDmapping2, on='bc', how = 'inner') ``` temp_map only has ~400,000 rows. Would appreciate any insights you might have - Thank you so much!
Hi @TijsWatzeels , in the metadata file (ROSMAP_snRNAseq_demultiplexed_ID_mapping.csv), the combination of "libraryBatch" and "cellBarcode" should be used to retrieve information. Suppose that you find barcode "GCATCTCGTCAACCTA-1" of batch "190403-B4-A" in the metadata file but do not find the same barcode in the count matrix of the batch. It means that the barcode was filtered out at some point of downstream analysis of the batch. The barcode "GCATCTCGTCAACCTA-1" in "190403-B4-A" is nothing to do with "GCATCTCGTCAACCTA-1" in "200316-B24-A" or "201007-B58-B". Sorry for confusing you. I will update the data files so as to avoid confusion.
Hi Tijs, Yes, the discussion will be continued here. I am tagging Masashi Fujita to help answer this, as a similar question was asked here: https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=10225 @masashi Hi Masashi. Do you mind helping this user with their question regarding the snRNAseq dataset? Thank you, Victor Baham
Hi Victor, Thanks a lot! Will this be continued in this thread once your colleague has more information? Have a nice day! Tijs
Hi Tijs, Thank you for clarifying. I have notified one of my fellow data curators about this issue. Thank you for your patience and explanations. Best, Victor
Hi Victor, The idea is to know what cells within a batch originate from what individual. I extract all barcodes from the demultiplexed_ID_mapping file that come from the same library batch and then merge that data to the barcodes in the snRNAseq matrix. However, this way I lose quite some cells that apparently are not matching a sample ID. I hope this clarifies. Thanks, Tijs
Hi Tijs, Thank you. To clarify, are you saying there are barcodes occurring within the library batches that are not found within the ROSMAP_snRNAseq_demultiplexed_ID_mapping.csv file?
Hi Victor, My mistake, sorry. I am using ROSMAP_snRNAseq_demultiplexed_ID_mapping.csv syn34572333 to match individuals to barcodes within each batch. Best, Tijs
Hi Tijs, Thank you for providing the synIDs of the relevant batches you tried. However, I don't seem to see a 'CellBarcode' column in the ROSMAP_biospecimen_metadata.csv file. Where do you see this column? Thank you, Victor
Hi Victor, Here I include Synapse IDs for the feature, barcode and matrix of the two batches I tried: syn51121931, syn51121946 and syn51121966 syn51121944, syn51121943 and syn51121961 Thanks! Tijs
Hi Tijs, Thank you for using the AD Knowledge Portal. Please provide the synIDs of the relevant 465 ROSMAP samples so that I may investigate this issue further. Thank you, Victor

Discrepancy ROSMAP_biospecimen_metadata.csv and matrix barcodes page is loading…