Hi, What is the best way to get the gene lengths of each gene in ROSMAP_all_counts_matrix.txt.gz (syn8691134)? Thanks Kevin

Created by Kevin Hu kevin.hu
Hi Kevin, It looks like Gencode v24 was used to process the bam files into the count matrix re: syn9757881. You can download the gzipped GTF file here: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/gencode.v24.annotation.gtf.gz The R code below will let you extract the gene lengths from the unzipped GTF file, just replace with the path to the GTF. ``` library(GenomicRanges) library(rtracklayer) library(genoset) GTF <- import.gff( "/gencode.v24.annotation.gtf", format="gtf", genome="GRCh38.p5", feature.type="exon") grl <- reduce(split(GTF, elementNROWS(GTF)$gene_id)) reducedGTF <- unlist(grl, use.names=T) elementMetadata(reducedGTF)$gene_id <- rep(names(grl), elementNROWS(grl)) elementMetadata(reducedGTF)$widths <- width(reducedGTF) calc_length <- function(x) { sum(elementMetadata(x)$widths) } output <- t(sapply(split(reducedGTF, elementMetadata(reducedGTF)$gene_id), calc_length)) colnames(output) <- c("Length") ``` Best, Jake

Gene Lengths page is loading…