Hi, [This new Nature paper](https://www.nature.com/articles/s41586-019-1195-2) cited a Synapse link: https://www.synapse.org/#!Synapse:syn18485175 How can I get access to the data? Thank you very much.

Created by Guoqiang Zhang guoqiangzhang
Hi @Mette, I have a question about the "filtered_count_matrix.mtx" . Is this matrix the final pr-epocessed data and ready to use, for example has it been log-noramlized etc or it just contains the raw counts and needs to be pre-processed? Thank you
Thank you very much for your clarification
BA10 is correct. We will make the correction to the tissue designation
Hi @Mette I have a small query about the tissue type used in this study. As mentioned in the snRNAseqPFC_BA10_biospecimen_metadata.csv, tissue- dorsolateral prefrontal cortex is used and Brodmann Area (BA) designated for this tissue is mentioned as 10. Even, in the Nature paper it is mentioned that Brodmann Area 10 is used. However, as per the BrodmannArea list, dorsolateral prefrontal cortex is BA9 and anterior prefrontal cortex is BA10. I shall be much obliged if you can kindly clarify the disparity between the tissue type and Brodmann Area mentioned.
With the exception of aggregate genomic data (or genomic summary results) Rush require the data to be controlled use (ie, agreement to terms in the Rush DUC). This is a requirement based on consent and the Rush IRB that we need to adhere to.
Hi @Mette, I seem to be confused about the definition of "public" in relation to these datasets. I've tried to look at several processed datasets and they all have restricted access, I.e. one needs to apply for data access to download them. While this makes perfect sense for fastq files, which could potentially be used to identify a patient, there is no such risk for processed data (read counts tables). So I wonder what is the rationale for why these datasets aren't actually "public", i.e. open access. I would appreciate if you could clarify. Many thanks, Aliaksei.
Hi All, The processed data is now public
Hello, Two of the fastq files available for download for this project results in files with an unexpected EOF. D17-8790_S2_L004_R2_001.fastq.gz D17-8766_S2_L004_R2_001.fastq.gz Does anybody else run into the same issue? Best, Shameek
@tkamath - the processed data upload is in progress. I recommend reaching out to the corresponding authors with questions related to the paper
Also looking forward to the processed data! I was wondering if there was any estimated timeline for when this would be made publicly available. I personally find it somewhat prohibitive to realign all sequencing data and greatly appreciate any efforts to have a processed dataset that would inform peers in the field of the quality of the dataset. In addition, under the "cell clustering" section in the Methods, the authors note that the final cell count for subcluster analysis is 70,643. However, in excel file in supp table 6 (tab 2), the row sum from J3 to J43 totals to 70,346 for all the subclusters used for downstream analysis. I assume this is a typo given the matching digits and was hoping this would be corrected by the authors.
Thanks a lot @karawoo! Looking forward to cell type annotation file @xujishu
That is information that will need to come from the data provider: @xujishu
Thanks a lot @karawoo. It would be great if you can also provide another file with cell id and the cell type information (such as neuron, microglia... ). Thanks a lot!
Thanks everyone for your interest. This mapping file is now available here: https://www.synapse.org/#!Synapse:syn18694015
Hi all Thanks for all the interest in the data. This is Jishu from RUSH/RADC We are working with @Mette to upload a mapping file, which can link between fastq file names, projIDs (ROSMAP project IDs) and subject IDs( used in in supplementary material) shortly. Thanks again for your interest. Best, -Jishu
It would be great if one of the authors could jump into this thread. I have been trying to line up the author's metadata provided in [supplementary table 3](https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1195-2/MediaObjects/41586_2019_1195_MOESM5_ESM.xlsx) with the clinical data from [ROSMAP_clinical.csv](https://www.synapse.org/#!Synapse:syn3191087) file on synapse. I thought I would put my experience here, maybe it will help one of you going through the same thing. The `projid` field in `ROSMAP_clinical.csv` can be used to join to the sample IDs from the `individualID` field of [biospecimen_metadata.csv](https://www.synapse.org/#!Synapse:syn18642936) for these snuc-Seq samples. So I have a data frame I call `supp` from the supplementary table, and another I call `ph` which I built by taking the rows out of `ROSMAP_clinical.csv` First thing is that there are 2 subjects in the snuc-seq dataset that do not appear in `ROSMAP_clinical.csv`: ``` > bioMD <- read.csv("biospecimen_metadata.csv", as.is = TRUE) > clin <- read.csv("ROSMAP_clinical.csv", as.is = TRUE) > setdiff(bioMD$individualID, clin$projid) [1] 20963866 11630705 ``` I got an updated clinical data file from Greg Klein at Rush and it has the last 2 individuals in it, so I don't think it is a typo. I don't think I'm allowed to share the file from Greg, but hopefully the authors or ROSMAP can supply this updated information. Anyway I mde ``` > newclin <- read.csv("new_ROSMAP_clinical_from_greg.csv", as.is = TRUE) > setdiff(bioMD$individualID, clin$projid) integer(0) > ph <- merge(bioMD, newclin, by.x = "individualID", by.y = "projid") ``` Next I loaded the supplementary table from the manuscript: ``` > supp <- read.csv("41586_2019_1195_MOESM5_ESM.csv", as.is = TRUE) ``` Then I compared the distribution of [`ceradsc`](https://www.radc.rush.edu/docs/var/detail.htm?category=Pathology&subcategory=Alzheimer%27s%20disease&variable=ceradsc) and [`braaksc`](https://www.radc.rush.edu/docs/var/detail.htm?category=Pathology&subcategory=Alzheimer%27s%20disease&variable=braaksc) in the two metadata tables: ``` > table(supp$ceradsc, supp$braaksc) 1 2 3 4 5 6 1 0 0 1 2 11 3 2 0 0 2 2 3 0 3 1 1 0 0 0 0 4 3 5 11 3 0 0 > table(ph$ceradsc, ph$braaksc) 1 2 3 4 5 6 1 0 0 1 2 11 3 2 0 0 2 2 3 0 3 1 1 0 0 0 0 4 3 5 12 2 0 0 ``` The bottom row is different. Since they use different keys for the data (ROSMAP official subject IDs and their own ROS1/ROS2 system) I can?t figure out why I have one less Braak 4 and one more Braak 3. The difference is in 2 female samples, with `ceradsc=4` (no pathology) which can be seen if I break down by [`msex`](https://www.radc.rush.edu/docs/var/detail.htm?category=Demographics&variable=msex) field: ``` > with(subset(supp, msex == "female"), table(ceradsc, braaksc)) braaksc ceradsc 1 2 3 4 5 6 1 0 0 1 1 7 1 2 0 0 0 0 2 0 3 0 1 0 0 0 0 4 2 0 6 3 0 0 > with(subset(ph, msex == 0), table(ceradsc, braaksc)) braaksc ceradsc 1 2 3 4 5 6 1 0 0 1 1 7 1 2 0 0 0 0 2 0 3 0 1 0 0 0 0 4 2 0 7 2 0 0 ``` Since `supp` table has `ROS1`, `ROS2`... identifiers and `ph` has `projid` identifiers (like `10101291`) as @guoqiangzhang pointed out there is no way to figure out the problematic samples. This is a pretty minor discrepancy (braak 3 versus braak 4) but it would be nice to resolve so everyone in the world can have their analysis of these data aligned. So all in all I think the authors should 1. provide mapping from ROS1 to projid. 2. resolve conflict about differences in Braaksc from supplementary tables ROSMAP could also help by uploading an updated `ROSMAP_clinical.csv` table that includes `projid=20963866` and `projid=11630705`, and that also includes `niareagansc`, which is what Mathys et al used to define no, low and high pathology. Big thank you to ROSMAP and Li Huei Tsai's group for moving this field forward!!!
Yes, I also need the id mappings. I would appreciate a lot if you could also provide cell type annotation (neuron, microglia, ...) for each cell (barcode id). I am looking forward for the processed data. Thanks a lot!
Thank @Mette Yes, I understand I can use syn3191087. However I want to use the same criteria as the authors to call a subject AD or control (the column "pathologic diagnosis of AD" of Supplementary table 1). And thank you so much for following up with the authors.
In the mean time, you may be able to infer the projid based on the clinical data: syn3191087
We do not have the mapping between the supplementary table and the projid (or individualID). I will follow up with the authors to see if they can provide that as controlled access
Hi @Mette, Thanks for making the data available to the community. The authors published some clinical meta data for the 48 ROSMAP subjects along with the paper [Supplementary Table 1](https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1195-2/MediaObjects/41586_2019_1195_MOESM3_ESM.xlsx). However the IDs they used in this file is ```ROS**```, which is different from the fastq prefix ```D17-8***```. Is there a way to link these two IDs? I understand that there is a individualID column in syn18642936 which is the same as ROSMAP projid, and I can use this column to link to the ROSMAP clinical data column (syn3191087) and get the clinical measurements. But Supplementary Table 1 has a column called "pathologic diagnosis of AD", and I want to use the same definition for AD vs. control as the authors did. Thanks for your help.
Thanks!
Thanks a lot! I have uploaded DUC form. I would appreciate if you can approve my request. Thanks a lot again.
Hi All, this data is now public: syn18485175. Note that processed data will be added and that additional clinical variables are available through [RADC](https://www.radc.rush.edu/requests.htm)
Thanks a lot!
Thanks for all the interest in the data. We are waiting for some additional information and plan to have this ready for you next week
+1
Hi, do you have any estimate when the data will be available?
Thanks! Looking forward to the data.
Thank you for your interest in the data. The data is in process of being uploaded and will be made available as soon as all content is in place. We will post a message in this forum and tag the [AMP-AD data updates team](https://www.synapse.org/#!Team:3372003). If you request to join that team you will receive a message.

syn18485175 ROSMAP single cell RNA-Seq page is loading…