Hi, I hope all is well, and thank you for providing these datasets. I have a specific list of ROSMAP donors with WGS data who should have paired RNA-seq data, which I would like to access. However, it seems only a small portion of this list is in the ROSMAP fastq and bam file repositories at syn21589959 and syn21188662. Is there another repository with ROSMAP's raw RNA-seq files, or is there someone I'd be able to contact regarding finding these samples' data? I was hoping there might be someone I can share the list with privately since there is a rule against posting specimenIDs and individualIDs on the forum. Thank you!

Created by J_Grundman
Hi @JessB I hope all is well. Thanks for your help earlier on this issue. I am looking for RIN values for some of the ROSMAP brain samples, but it appears that some are missing RIN information. I was looking here: syn21088596 but there are missing values for half of the samples I'm interested in. Is there another repository where this information could be located for these samples?
Hi Cory, [This discussion thread](https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=10177) includes information about how to match sample IDs across WGS genotypes and RNAseq fastq files. If that info isn't what you are looking for, it would be helpful if you could supply synIds for some examples of files that you are trying to match up. Best, Jessica
@JessB Hi Jessica, My issue isn't so much how to search the ADKP as it is sample matching. I need to be able to match the sample IDs for the RNAseq within AMP-AD to the VCF samples in AMP-AD. There's nothing in the interface that I see that can do this, and I don't know of a file or spreadsheet where these are matched up. If you are aware of one, I would greatly appreciate you pointing me to it. Thank you!
Hi Cory, Apologies for the delay in responding to this, but we didn't see that a new question had been posted in this thread. Please create a new post the next time you have a question, as that will ensure that we get a notification. You can use the AD Knowledge Portal's Explore>Data interface to filter for specific sets of files of interest. For example, [this view](https://adknowledgeportal.synapse.org/Explore/Data?QueryWrapper0=%7B%22sql%22%3A%22SELECT%20*%20FROM%20syn11346063.39%22%2C%22limit%22%3A25%2C%22offset%22%3A0%2C%22selectedFacets%22%3A%5B%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22fileFormat%22%2C%22facetValues%22%3A%5B%22vcf%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22consortium%22%2C%22facetValues%22%3A%5B%22AMP-AD%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22organ%22%2C%22facetValues%22%3A%5B%22brain%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22assay%22%2C%22facetValues%22%3A%5B%22wholeGenomeSeq%22%5D%7D%5D%7D) is filtered to show all of the released files with format = vcf, assay = whole GenomeSeq, program = AMP-AD, and organ = brain. You can add additional filters to narrow the scope of the query by using the filter interface on the left. Note that not all of the available filters are displayed by default; you can enable additional filters by clicking on them in the "Available Facets" section at the bottom of the filter interface. We do not recommend browsing for files directly in Synapse, as related files may be stored in different locations based on when they were submitted, who they were submitted by, and other factors. Synapse also contains unreleased files, which may still be missing data or supporting metadata. Please let me know if this doesn't answer your question. I'm not exactly sure what was done for the eQTL analysis that made it a defined thing that is easier to work with, but if you can provide any additional info I can look into the possibility of applying that process to other datasets. Best, Jessica
Hello, Similarly, I'd like to find matching the VCF files for all of the AMP-AD generated postmortem brain samples. Part of the point of AMP-AD was to be able to do this, and I think it was done for the eQTL analysis. It would be nice to have this as a defined thing, as opposed to having to sift through all the WGS VCF files in the database and match up sample names to the RNAseq.
Hi Jessica, Thanks! I am taking a look and it currently looks promising. I will reach out if there are still some missing though. I appreciate your help!
Hi Jennifer, The ROSMAP RNAseq files were contributed at different times, and therefore the files are not all stored together in our backend system, Synapse. If you take a look at [this filtered view](https://adknowledgeportal.synapse.org/Explore/Data?QueryWrapper0=%7B%22sql%22%3A%22SELECT%20*%20FROM%20syn11346063.39%22%2C%22limit%22%3A%22500%22%2C%22offset%22%3A0%2C%22selectedFacets%22%3A%5B%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22study%22%2C%22facetValues%22%3A%5B%22ROSMAP%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22assay%22%2C%22facetValues%22%3A%5B%22rnaSeq%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22tissue%22%2C%22facetValues%22%3A%5B%22dorsolateral%20prefrontal%20cortex%22%2C%22head%20of%20caudate%20nucleus%22%5D%7D%2C%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22dataSubtype%22%2C%22facetValues%22%3A%5B%22raw%22%5D%7D%5D%7D) in the AD Knowledge Portal, are the files you are looking for there? That filtered view contains 1640 bam files and 542 fastq files; 1433 are DLPFC and 749 are caudate. These numbers seem closer to what you are expecting to see, but if you feel that there are still missing files please let us know and we can continue to investigate. Best, Jessica
Sorry, at this point it would be necessary for one to contact the research team and inquire about any additional data files available. If there are files they are willing to contribute, we are happy to help with the process. Rich
Hi Rich, I appreciate your reply. Unfortunately, I've already sifted through there, and the samples are indeed missing from that repository as well. In particular, it appears that over a hundred samples with caudate nucleus data are missing corresponding prefrontal cortex data. I am interested in locating these specifically. Is there another recourse? Thanks!
Thanks for your patience. It appears all of the ROSMAP RNAseq data were contributed as BAM files here: https://www.synapse.org/#!Synapse:syn22333035 This includes data from DLPFC, AC, and PCC. Would you explore those data and let me know if any samples appear to be missing. Rich
Hi, I hope all is well. I just wanted to follow up on this, since I am still having trouble locating these files. Would you be able to help me further? Thanks!
Hi Rich, Thanks for your reply! I'm specifically interested only in DLPFC and caudate samples. Here would be my filtering: biospecimen %>% filter(str_detect(assay, "rnaSeq")) %>% filter(str_detect(nucleicAcidSource, "bulk cell")) %>% filter(tissue %in% c("dorsolateral prefrontal cortex", "Head of caudate nucleus")) %>% filter(is.na(exclude)) This yields 918 unique individual IDs and 1605 rows. In addition, according to this filtering, there should be 728 caudate nucleus samples and 877 DLPFC samples. So it seems to me there is a significant number of missing samples from the fastq files at syn21589959. If we assume each sample would have an R1 and R2 fastq file, then I would expect 3210 files in that repository instead of 698.
Thanks for including those details! That provides helpful context. I will continue to look into this. Would you please describe in more detail how you are querying the biospecimen metadata and the number of records it yields? I want to make sure I am looking at the right subset of records. For example, ``` # Biospecimen: https://www.synapse.org/#!Synapse:syn21323366 bio_meta <- synGet("syn21323366")$path %>% read_csv() bio_meta %>% filter(str_detect(assay, "rnaSeq")) %>% filter(str_detect(nucleicAcidSource, "bulk cell")) %>% filter(str_detect(organ, "brain")) %>% filter(is.na(exclude)) %>% dim() ```
Hi Rich, I appreciate your reply. Yes, I am only interested in bulk brain data and have explored the data at the location you mentioned. According to syn21589959, there are 698 files. Given each sample has 2 fastq files, this is a total of 349 individuals. However, the RNAseq harmonization page at https://www.synapse.org/#!Synapse:syn21241740 says there should be 911 donors with RNAseq and WGS data. It seems there are 562 donors missing then, unless they only provided blood RNAseq or microglia RNAseq (which I don't think is the case since we found that the biospecimens metadata sheet says there should be more than 349 individuals with brain RNAseq). Would you be able to help me locate the other samples? Thank you for your help!
Hi @J_Grundman Thanks for your question about the ROSMAP data. Are you specifically interested in bulk brain data? Have you explored all the available RNA seq data here [syn23650893](https://www.synapse.org/#!Synapse:syn23650893) ? Rich

Help requested in locating specific ROSMAP bulk brain RNA-seq files page is loading…