Hi I was trying to process fastq files from ROSMAP microglia RNA seq and single nucleus RNA seq. I am wondering if the fastq files have already trimmed or still need to using fastx to trimmed the poor reads?(syn11578941,syn17055069)

Created by Junming Hu hjunming
@hjunming If you access the data through the AMP-AD Knowlege Portal (which is a site built on top of Synapse) it may make it easier to see the connectedness between the data. [See here](https://adknowledgeportal.synapse.org/#/Explore/Studies?Study=syn3219045). Note, the study summary and methods description for all the assays, links to the metadata and data files. And, at the very end - related studies, including the 'snRNAseqPFC_BA10'. That study is data as published here: https://www.nature.com/articles/s41586-019-1195-2 "may I asked syn16780177 is coming from which region?"; under Study Data (for the main ROSMAP study) filter 'data type' by gene expression - > click on the link to the scrnaSeq assay files and notice the tissue annotation (it is all from the DLPFC). Brain region information is also available ion the biospecimen metadata file
Hi @meagan Thanks for your help! BTW I noticed that there have another rosmap single nucleus RNA sequencing files store in (https://www.synapse.org/#!Synapse:syn18485175). May I ask what difference between them? and also may I asked syn16780177 is coming from which region? Many Thanks
Hi @hjunming -- I've updated this so that these files will now have the same display name and download-as name, both in a Cellranger-friendly format.
Hi @hjunming -- Thanks for bringing this up. We'll look into this and get back to you.
Hi @Mette @vilasmenon could you please check the download files?
Hi Mette, Thanks for reply. I saw the table before. But when I download the file using (syn17055069). The file name totally changed. This is the name display when I download: **MFC-B1-2-Cog1-Path0-M_S13_L007_R2_001.fastq.gz** MFC-B1-S3-Cdx1-pAD1-F_S1_L007_R2_001.fastq.gz MFC-B1-S6-Cdx4-pAD0-M_S6_L007_R1_001.fastq.gz MFC-B1-S8-Cdx4-pAD1-M_S7_L007_R2_001.fastq.gz MFC-B1-S1-Cdx1-pAD0-F_S1_L006_R1_001.fastq.gz MFC-B1-S4-Cdx1-pAD1-M_S5_L007_R1_001.fastq.gz MFC-B1-S6-Cdx4-pAD0-M_S6_L007_R2_001.fastq.gz SYNAPSE_METADATA_MANIFEST.tsv MFC-B1-S1-Cdx1-pAD0-F_S1_L006_R2_001.fastq.gz MFC-B1-S4-Cdx1-pAD1-M_S5_L007_R2_001.fastq.gz MFC-B1-S7-Cdx4-pAD1-F_S3_L006_R1_001.fastq.gz batch2 MFC-B1-S2-Cdx1-pAD0-M_S4_L007_R1_001.fastq.gz MFC-B1-S5-Cdx4-pAD0-F_S2_L006_R1_001.fastq.gz MFC-B1-S7-Cdx4-pAD1-F_S3_L006_R2_001.fastq.gz MFC-B1-S3-Cdx1-pAD1-F_S1_L007_R1_001.fastq.gz MFC-B1-S5-Cdx4-pAD0-F_S2_L006_R2_001.fastq.gz MFC-B1-S8-Cdx4-pAD1-M_S7_L007_R1_001.fastq.gz This is the name show in website: MFC-B1-S1-Cdx1-pAD0-I1.fastq.gz MFC-B1-S1-Cdx1-pAD0-R1.fastq.gz MFC-B1-S1-Cdx1-pAD0-R2.fastq.gz MFC-B1-S2-Cdx1-pAD0-I1.fastq.gz MFC-B1-S2-Cdx1-pAD0-R1.fastq.gz MFC-B1-S2-Cdx1-pAD0-R2.fastq.gz MFC-B1-S3-Cdx1-pAD1-I1.fastq.gz MFC-B1-S3-Cdx1-pAD1-R1.fastq.gz MFC-B1-S3-Cdx1-pAD1-R2.fastq.gz MFC-B1-S4-Cdx1-pAD1-I1.fastq.gz MFC-B1-S4-Cdx1-pAD1-R1.fastq.gz MFC-B1-S4-Cdx1-pAD1-R2.fastq.gz MFC-B1-S5-Cdx4-pAD0-I1.fastq.gz MFC-B1-S5-Cdx4-pAD0-R1.fastq.gz MFC-B1-S5-Cdx4-pAD0-R2.fastq.gz MFC-B1-S6-Cdx4-pAD0-I1.fastq.gz MFC-B1-S6-Cdx4-pAD0-R1.fastq.gz MFC-B1-S6-Cdx4-pAD0-R2.fastq.gz MFC-B1-S7-Cdx4-pAD1-I1.fastq.gz MFC-B1-S7-Cdx4-pAD1-R1.fastq.gz MFC-B1-S7-Cdx4-pAD1-R2.fastq.gz MFC-B1-S8-Cdx4-pAD1-I1.fastq.gz MFC-B1-S8-Cdx4-pAD1-R1.fastq.gz MFC-B1-S8-Cdx4-pAD1-R2.fastq.gz if you find the first one from mine, you will not see them in website.
See the methods section for the 'gene expression (single nucleus RNAseq)': https://adknowledgeportal.synapse.org/#/Explore/Studies?Study=syn3219045. See note about filenames. If you follow the link to the singlecell RNAseq datafiles you will see that files are annotated with a 'specimenID'. That specimen ID maps to the single cell RNAseq assay metadata and biospecimen metadata file under 'metadata files'
@vilasmenon Hi I found something weird when I trying download ROSMAP Gene Expression (single nucleus RNA seq)(syn17055069) The filename I download are totally different from what I saw in website. some of them even did not match the instruction files. I am wondering if there is something wrong here. Could you please check it? Many Thanks
@Mette @vilasmenon Absolutely Yes!
@vilasmenon thank you for your input. I second the encouragement to let us know about interesting findings. @hjunming we are always interested in how the data is being used
@hjunming you're very welcome! Please let us know if you find anything interesting with the single-nuc RNA-seq data with and without trimming the reads.
Great! Thanks!
@hjunming For the single-nucleus RNA-seq data here, a given sample was not run across multiple lanes, so there was no need to combine them after alignment. For the bulk RNA-seq, yes, samples run across different lanes can be combined either before or after alignment. If using RSEM, it is better to combine them before alignment.
@vilasmenon one more question. for those samples with different lanes. did you combine them after alignment?
Thanks a lot@vilasmenon
Hi @hjunming - yes, for the bulk RNA-seq microglia data (syn11578941), the fastq files were trimmed. and were not processed using cellranger. It's the single-cell/single-nucleus RNA-seq data from the 10x platform that does not use any explicit trimming, apart from what is in the cellranger pipeline.
Hi@vilasmenon How about microglia RNA seq. It shows you are using fastx to trim the data. is that fastq files have being trimmed?
@hjunming In general, we use the cellranger pipeline (mkfastq and count) to generate and align the fastqs. It does minimal trimming, as far as I'm aware, but we do not do further trimming offline. I would be interested to know if there's a difference in the alignment percentage with an intermediate trimming step before running cellranger count on this data.
Many Thanks!
@vilasmenon - please see the question above regarding the microglia RNAseq.

Question about the ROSMAP single cell fastq files and RNA seq data for Microglia page is loading…