Hi, Thank you for providing MSBB dataset. I have a couple of questions regarding normalization and covariate correction on RNA-seq data. Related files to my question have deposited in NormalizedExpression folder with Synapse ID: syn7391833. In general I am eager to know what approach has taken and how result looks like after covariate correction. 1) Is the code used for data correction available to look into details? In syn7391869 folder there are codes for alignment and quantitation but I didn't find script for normalization and covariate correction. 2) Have you used any dimension reduction approach (i.e. PCA / t-SNE / umap) to see how samples cluster after normalization and covariate correction? By removing confounding effects of batch, sex, race, age, RIN, PMI, exonic rate and rRNA rate, it is fairly expected samples cluster based on one of the AD related indexes such as CERAD, CDR, Braak or broddman area. However, based on my current analysis I can't see such clustering. I was wondering have you (or any other users of this dataset) been able to cluster samples using the normalized data? Thanks in advance, Rasool

Created by Rasool Saghaleyni rasools
Dear @minghui.wang would you please share the link of study that you have mentioned in our conversation here? Thank you very much!
Dear @minghui.wang , thank you for sharing your thoughts on this. It is very interesting that you have identified different subtypes of AD with drastic differences and opposite trends in their transcriptom profile. I will go through the results of that study to better understand data provided here. Best, Rasool
The raw data was first corrected for batch by a mixed model, and then the residual was further corrected for PMI, sex, age of death, exonicRate, rRNA rate, and RIN by linear regression. Regarding sample clustering, it is not totally unexpected that the samples do not show clear separation by disease traits. Just a few thoughts on this. Given that a relatively small fraction of the transcriptome are dysregulated by AD, dimension reduction like PCA based on full transcriptome data may not be able to pick up the difference. Distinct from a case-control design, the current samples are showing a continuous spectrum of disease trait severity from normal to severe. There is a fair amount of heterogeneity among the samples. The boundary in expression change may not be that clear. Moreoever, our recent analysis identified several potential AD subtypes, each present different even opposite trend of transcriptomic change.
Dear @minghui.wang, can you help answer these questions?

MSBB - RNA-seq - Normalization and covariates correction page is loading…