@jaeddy Synapse ID: syn10507739 Hello! I wonder why there are all float number instead of integral number in the "counts-matrix" file in syn10507739. Is there any way to get the raw counts data of transcripts for genes in syn8691134?

Created by Jiaxin Zhou Jiaxin
@jgockley Thank you, again it is a clear and good answer. I'll use biomaRt package to transfer the gene ids.
Hi @fanc232 , If I follow you correctly, the ENSGs aren't translating to ENTREZ IDs because of version issues? This should be an easy fix. Unfortunately the aggregate count matrix doesn't have provenance on it but the individual sample level counts matrices do (eg. [11252914](https://www.synapse.org/#!Synapse:syn11252914)). These files were created with the script [run sailfish.sh](https://www.synapse.org/#!Synapse:syn11248988) and deployed GRCh38 gencodev24 to index counts. I'm unfamiliar with clusterProfiler, but in some cases for biomart you can trim the decimal as it only represents the gene version (This may or may not be okay with ENTREZ ID translation, but I would check on that). eg( ENSG00000223972.5 is the 5th version of ENSG00000223972) Please let me know if I misinterpreted the issue!
Hello @jgockley Sorry to make some misleading to you. I'm not using bio-mart, but using clusterProfiler to transfer gene IDs. I asked that question because I used to think that, if I split the strings in the first column, say, "ENST00000456328.2\|ENSG00000223972.5\|OTTHUMG00000000961.2\|OTTHUMT00000362751.1\|DDX11L1-002\|DDX11L1\|1657\|processed_transcript" I thought if I split this string with "\|" in Rstudio, then the 7th character, "1657", in the form of integral number, will be the ENTREZ ID. Then according to your response it seemed not the case. However I still need to transfer gene IDs, so I need to know the ENSEMBL version that is corresponding to gene ids in this file. Thank you for your kindly help!
Hi @fanc232 Are you using bio-mart? If you post code for me to reproduce the problem I might be able to offer some help.
@jgockley Sorry to disturb you, but I found there are 60554 ENSEMBLE id for genes while only 9228 corresponding ENTREZ ids. So I want to know if the col. 7 after splitting the ID string with "|" represent the ENTREZ ids. Thanks.
Hi, @jgockley. I understand the problem. Thank you for your help!
Hi @Jiaxin Unfortunately this data was generated by Sailfish where the read counts are estimated using an expectation-maximization procedure based on the relative transcript abundances to estimate relative transcript abundance. The Bam alignments are available however if you're interested in deploying another quantification tool. [syn8540863](https://www.synapse.org/#!Synapse:syn8540863)

Raw Counts Data for transcript expression in isoform level page is loading…