Hi there,
I am looking to access the raw data (not tpm or normalized) from The Glioma Longitudinal Analysis Consortium. I see there are links to the raw data accessions.. is this the only way to obtain the raw files?
Additionally, I see there is a file labeled ?transcript_count_matrix_all_samples.tsv? is this raw transcript counts or have these been normalized?
Thank you!
Created by Hanna Minns hminns Thank you for checking. Would you have the kallisto abundance.tsv files available for all of the samples? I'm sorry, we do not have that table available. Thanks so much! Would it also be possible to get the gene-level summarized count matrix for all samples?
We have uploaded a table containing the effective lengths for each sample's transcripts entitled `transcript_eff_length_matrix_all_samples.tsv` to the "Files" section of the Synapse page. I hope this helps! Hello,
Would it be possible to get the kallisto abundance.tsv files for each sample or a table of the effective lengths for each sample's transcripts. The effective lengths are necessary to read in the files for Differential Expression analysis using either the tximport function for use in edgeR/DESeq2 or catchKallisto in edgeR's tanscript-level analysis workflow. The file you've made available "transcriptcountmatrixallsamples.tsv" only contains the estimated counts and there are no effective lengths for each sample's transcripts.
Thanks Thanks for your interest! The file "transcript_count_matrix_all_samples.tsv" is the estimated counts for each GLASS sample as calculated using kallisto. For the transcript per million values, please see either the "transcript_tpm_matrix_all_samples.tsv" (ensembl transcript ID) or "gene_tpm_matrix_all_samples.tsv" (gene symbol) files. For Q1, please see the response in the following thread: https://www.synapse.org/#!Synapse:syn17038081/discussion/threadId=8955.
Tagging @fsvarn for Q2.