Hi, One single cell sample (at syn21438358) Microglia_MO_AD2 (syn21389507) seems problematic. When processing this sample by cell ranger (v6.0.0), I got an error "An extremely low rate of correct barcodes was observed for all the candidate chemistry choices for the input". No valid chemistry is detected as below: - 0.2% for chemistry SC3Pv3 - 0.0% for chemistry SC5P-PE - 0.0% for chemistry SC3Pv2 - 0.0% for chemistry SC3Pv3LT Does anyone have advice on this sample? Thanks, Minghui

Created by Minghui Wang minghui.wang
Deer professor?can you provide the control sample data TLE CTX?
we cannot find the TLE CTX data
Hi @abby.vanderlinden I am a Student MTech Computer Science ,Our team wants to do a project on AD and Aging .And will try to find out the features which will effect most the AD and Aging Patients. In our Project we will use Machine learning and Deep learning to create a model to find out how the features effected the AD and Aging patients for this we need the Data Set which have huge number of data is available. Can you please suggest any data se.
@kiliankleemann The fastq files are indeed annotated with individualID! We were informed by our collaborators at Rush many years ago that the 8-digit 'projids' in the ROSMAP clinical metadata were not completely de-identified and should not be shared publicly. Because our file annotations are public, we created a randomized value beginning with "R" for the individualID key in our ROSMAP metadata. These individualIDs annotated on each fastq file can be mapped to the original projids in the ROSMAP clinical metadata file syn3191087. You can view the AD Portal user documentation [here](https://help.adknowledgeportal.org/apd/About-Metadata.2241626149.html#AboutMetadata-Whatisthestructureofmetadata?) for more information on how metadata is structured in the AD Portal.
@kiliankleemann The individual IDs are available from the publication https://www.nature.com/articles/s41467-020-19737-2 supplementary data 1: https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-020-19737-2/MediaObjects/41467_2020_19737_MOESM3_ESM.xls Although the sample IDs are a little bit different and one of the AD samples are missing fastq files here.
@abby.vanderlinden I could not find that fastq files are annotated with individual ID. It would be helpful to provide metadata associating file names AD1, AD2 etc to indidual IDs, as it was done for other single nucleus RNAseq projects from ROSMAP.
@abby.vanderlinden Thanks for your reply and thanks for the useful information. I got it, I'll do the cross-checking as you suggested.
@xinxingwu We have not gotten any more information about the problems with estimated cell counts for Microglia_MO_AD2. I'll reach out again and see if I can get an answer. For the fastq files, I believe that the "AD" and "MCI" strings in the file name represent the diagnosis, but I would strongly recommend cross-checking this against the ROSMAP clinical metadata file syn3191087. The fastq files are annotated with individualID, which can be mapped to the clinical file. The original diagnosis classification referred to in the filenames I believe is based on the field "cogdx" in the clinical file, rather than braaksc or ceradsc. You can see the codebook for the clinical file here for more information on these variables: syn3191090.
Hi, does Microglia_MO_AD2 still have this issue? Is that so? When I used cell ranger v6.1.2, the same issue happened. By the way, for fastq files, the fields AD and MCI in the file names already represent the patient classification, right? By these two fields, we can directly differentiate AD or MCI samples, right? Or do we need to re-attach the individuaIID to the MetaData file, for example, syn3157322, to tag and classify samples? I respectively tried the fields braaksc and ceradsc in syn3191087, but obtained different classification from the fields AD and MCI in the file names. Thanks.
@Mette Do you have any ideas on who could be contacted with questions about the ROSMAP microglia scRNAseq data generation? It sounds like we figured out the issue with the reversed read files, but I'm not sure who would know more about the estimated cell counts for this sample.
Hi Abby, Microglia_MO_AD2 is the only one has this issue. And I am pretty sure this is the problem because R1 file is larger than R2 for AD2 which is usually the reversed way for typical scRNA-seq data. I would really appreciate additional information for AD2 regarding the the estimated cells! I can't figure it out, despite the discrepancy of the estimated cell numbers, the analysis result for AD2 seems ok and comparable to that for other AD samples in the original literature. Thanks, Bowen
@bowenjin That's really helpful, thank you! Have you encountered this problem for any other samples in this dataset? If it's just this one sample, I'll add a note that the read files should be reversed in the folder wiki and on the AD Portal. I'm not sure about the low cell number estimation, though -- I'll ask around and see if I can find someone who was involved in the original data upload to ask about that. Thanks, Abby
Hi, I had the same issue when processing Microglia_MO_AD2 via cell ranger. I figure out that the R1 file and R2 file should be reversed. After I manually rename the R1 to R2 and R2 to R1 the Valid Barcodes were 98%. But even after the correction, the single-cell data for AD2 is still somehow problematic: there are only 440 cells estimated for AD2, which is way fewer than other samples, e.g., 2,720 for AD4. I hope that will help! Bowen
Hm, that is strange. In the [scRNAseq assay metadata file](https://www.synapse.org/#!Synapse:syn21073536), this specimen is listed as having 97.90% valid barcode reads. @jgockley Do you have any thoughts on this?

ROSMAP microglia single cell data page is loading…