Hi all, I just started using the ROSMAP WGS data. the joint VCF files contain 1196 subjects, and looking at the metadata, they all seem to be either from 'MAP' or 'ROS' cohorts. However, according to the cohorts' description here https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyDetails?Study=syn22264775, the number of samples should be the following: ROSMAP: 1200 samples MSBB: 349 samples, Mayo: 349 samples, which sum is 1898. This is the same number included in the VCF files' name: NIA_JG_1898_samples_GRM_WGS_b37_JointAnalysisXXXX. So what I would like to know is: 1) do the 1196 come from the three datasets? 2) where are the missing ~700? Thank you very much

Created by Marianna Sanna marianna
Hi Jared, thank you for you reply. I will then use those files. Best wishes, Marianna
Hi Marianna, The VCF file of the three cohorts are split by cohort and chromosome. ROSMAP: https://www.synapse.org/#!Synapse:syn11707419 Mayo: https://www.synapse.org/#!Synapse:syn11707308 MSBB: https://www.synapse.org/#!Synapse:syn11707204 The VCF file for everything is very large (you saw 25 GB for all of ROSMAP only), so that's why the splits occur. Let me know if you have further questions. Regards, Jared
Marianna, Thanks Marianna. I will look into the VCF file shortly. Regards, Jared
Hi Jared Hendrickson, thank you for getting back to me. One of the files is _syn11714389_ . Thanks for looking into this. Best wishes, Marianna
Hi Marianna Sanna, Can you please direct me to the exact VCF file you are looking at, preferably by Synapse ID? From there, I can do some data exploration and contact other members of my team if needed. Regards, Jared

WGS data sample size page is loading…