Hello,
I wanted to thank you for making this great resource accessible.
We are interested in the expression of several lncRNAs. When I checked their expression in the gene_tpm_matrix_all_samples.tsv, they are shown as having an expression of zero. However, we have decided to process the data starting with the transcripts. When looking at the transcripts, I cannot find the same genes.
I provide my example code in R below. The main thing I?m trying to understand is: were these genes measured and their expression is zero or were they not measured?
Thank you very much in advance for your answer!
Best,
Anja Hartewig
```
GLASS_RNA_seq <- read.delim(paste0(input_dir, "gene_tpm_matrix_all_samples.tsv"))
# one of our genes of interest is NEAT1
grep("NEAT1", GLASS_RNA_seq$Gene_symbol)
# returns 29477, rowSums is 0
```
```
# the ENSG identifiers and transcripts were extracted from gencode v30
# I double-checked them from ensemble.org on 11.04. (GrCh38.p14)
GLASS_RNA_seq_tx <- read.delim(paste0(input_dir, "transcript_count_matrix_all_samples.tsv"))
ENSEMBL_transcripts_NEAT1 <- c("ENST00000501122", "ENST00000601801",
"ENST00000499732", "ENST00000670617",
"ENST00000645023", "ENST00000646243",
"ENST00000642367", "ENST00000612303",
"ENST00000616315")
sum(GLASS_RNA_seq_tx$target_id %in% ENSEMBL_transcripts_NEAT1)
# returns 0
```
```
# I also tried using EnsDb.Hsapiens.v75
library(EnsDb.Hsapiens.v75)
GLASS_ensembl <- EnsDb.Hsapiens.v75
tx_to_gene_anno <- transcripts(GLASS_ensembl, return.type = "data.frame")
NEAT1_anno <- tx_to_gene_anno[grep("ENSG00000245532", tx_to_gene_anno$gene_id), ]
sum(GLASS_RNA_seq_tx$target_id %in% NEAT1_anno$tx_id)
# also returns 0
```
Created by Anja Hartewig AHartewig Hi Anja,
Thanks for your interest in our data resource. To answer your question, the non-coding transcripts were not measured in our analysis, hence them not being found in the transcript matrix. We agree that this was not apparent in the gene expression matrix. To rectify this confusion, we have uploaded a new version of the gene expression matrix where we have removed the gene IDs for these unmeasured transcripts. We have noted these changes in the Wiki for the gene expression matrix. Thank you for bringing this to our attention. Once again, we appreciate your interest in the data resource.
Fred
Drop files to upload
Different genes in transcript_count_matrix_all_samples.tsv and gene_tpm_matrix_all_samples.tsv? page is loading…