For our project, we want to genotype ROSMAP samples based on a single base pair polymorphism on chromosome 17. This seems to be easily possible with the provided WGS genotyping files, in our case syn10998125. However, the sample IDs we find in this file do not correspond to the names of the files in the bulk brain RNAseq data, folder syn21589959. Is there a way to connect the genotype of a sample with the RNAseq data of that sample? Thank you very much for you help in advance.

Created by Luca Wagner luca.wagner
Thanks a lot Will, have done so now
Hi @cmoses, Glad to hear that things are lining up! I believe you are on the right track for using the knowledge portal to download the annotations. Unfortunately I don't have time to look deeper into this issue or try to reproduce it before my vacation next week, can you make a separate post on the forum so that someone else from our team can take a look? Thanks, Will
Hi Will, thanks a lot for the response. The annotations are indeed what we want to be able to cross-reference our samples with the metadata file. We're able to view the annotations for each individual file by clicking the 'Annotations' button in the top right of a file on the Synapse web page like you mentioned. However, we want to download the annotations for a whole collection of samples simultaneously. To do this, we tried to follow step 7 in this information on how to use annotations: https://help.adknowledgeportal.org/apd/Use-Case-%231:-Find-and-Download-Data-Associated-With-a-Selected-Study.2426535991.html , but for us clicking "Download options" in our ROSMAP dataset (e.g. syn21589959) only gives two options: "Add to download cart" and "Programmatic options", it doesn't have the "Export table" option that is described in step 7. Is the only option to download the annotations programmatically? Thanks Colette
Hi @luca.wagner, The biospecimen metadata can be used to link the samples in the VCF file to individualID's and specimenID's in bulk RNASeq reads. To find the individualID and specimenID annotations, you can click on the 'Annotations' button in the top right of a file on the Synapse web page, or alternatively use one of the available command line clients to extract those values ([Python client](https://python-docs.synapse.org/build/html/index.html#synapseclient.Synapse.get_annotations), for example). I took a quick look at some of the samples in that VCF file, and I'm finding raw bulk RNASeq reads in BAM format that match the same individualID's here: syn21188662 (Rosmap datasets often store raw reads in BAM format). Let me know if this helps or if I'm off-base. Best, Will
Hi Will, Thank you for your answer. The answer you gave us is helpful for the VCF files, but now we also want to know if bulk RNAseq fastq files can also be cross-referenced with the metadata file, as we're unsure how to obtain the individualID or specimenID from the RNAseq fastq files. How would that work? Thank you in advance for you help, Best, Luca
Hi @luca.wagner, I looked into this, and it seems that the sample ids in the VCF file correspond to 'specimenID' annotations in the Synapse files, which might not map directly between the different data modalities. I would recommend using the Biospecimen metadata file here: syn21323366. The sample ID's in the VCF file can be found in this biospecimen metadata, which will also give you the Individual ID for each sample. You can then see what other samples/specimens are available for each individual, which will include bulk RNASeq samples. Let me know if this makes sense or if I can provide further guidance. Best, Will

Matching sample IDs of WGS genotypes and RNAseq fastq files page is loading…