Good afternoon I am looking to match the sample IDs from the clinical file (eg IDs are R1004922 and projectid 18301541) to the sample IDs from the RNA sequencing file (eg sample ID I assume is 525_120515_0). Is there a file you can direct me to that will have a all IDs so that I can link the two files by ID? Many thanks, and apologies if this is cross posting. Celia

Created by cvanderm
Hi @jgockley Thanks very much! There are still 2 questions. 1st question,ROSMAP_RNAseq_FPKM_gene.tsv (syn3505720) gene expression values for other samples are only one technique repeat or average value of all technique repeats? Except 492_120515, others' last number is 0 or 1. 2nd question, counts <- read.table( synapser::synGet('syn3505720')$path, header=T, sep='\t' ,check.names =F) ,can check.names=F used in this code? Thanks for you reply.
Hi @zhangliandong I see the issue now. Please refer to this meta data. These are in fact single samples and the the ``` _0 / _6 / _7 ``` refer to the batch that the sample was sequenced in. You can match the Biospecimin to individual to clinical data between the provided meta data files in syn3157322. @nicole.kauer & @Mette are currently working through these issues and will advise. ``` foo<-read.csv( synapser::synGet('syn21088596')$path, header=T, stringsAsFactors = F) foo[ as.character(foo$specimenID) %in% '492_120515', ] specimenID platform RIN rnaBatch libraryBatch sequencingBatch libraryPrep libraryPreparationMethod isStranded 492_120515 9.100000 NA 0 nan polyAselection nan True nan pairedEnd 492_120515 9.100000 NA 6 nan polyAselection nan True nan pairedEnd 492_120515 9.140738 NA 7 nan polyAselection nan True nan pairedEnd 492_120515 9.100000 NA 0 0 polyAselection True pairedEnd 492_120515 9.100000 NA 6 6 polyAselection True pairedEnd 492_120515 9.140738 NA 7 7 polyAselection True pairedEnd ```
Hi @jgockley ROSMAP_RNAseq_FPKM_gene.tsv (syn3505720) has X492_XXXXXX_0 , X492_XXXXXX_6 , X492_XXXXXX_7 , these 3 specimenID match to one same individualID. Based on your code, Other individualID only have one specimenID in syn3505720. Why only this person has 3 specimenID in syn3505720? Is this an error or something? Thanks for you reply.
@zhangliandong Individuals can be profiled in multiple tissues. Each biosample has a unique specimen ID that is assigned back to the induvidualID. Does this make sense? I couldn't quite get at what first issue with X492_XXXXXX_X was. @PeterSarkies Please pass along code.
Hi @jgockley Based on your code,in syn3505720 file,specimenID X492_XXXXXX_X(X means numbers) is different from others. This specimenID's last number is 0 or 6 or 7.Other specimenID only have one number. Other specimenID can find the only individualID,there 3 specimenID match to one individualID. Thanks for your reply!
Dear @jgockley- Removing the _ and the last digit did not seem to work- is there something else that needs to be done to the header from the bulk brain RNAseq data to enable matching to the clinical data file? Thank you for your help.
Hi @cvanderm Sorry for the runaround, we've been trying to stream line metadata and assay sample alignment and are still in the process. This should give you what you're looking for: ``` #Download files > synapser::synLogin() > counts <- read.table( synapser::synGet('syn3505720')$path, header=T, sep='\t' ) > Meta <- read.csv( synapser::synGet('syn3191087')$path, header=T, stringsAsFactors = F) > Meta2 <- read.csv( synapser::synGet('syn21323366')$path, header=T, stringsAsFactors = F ) > row.names(Meta) <- Meta$individualID > FullMeta <- cbind( Meta2, Meta[Meta2$individualID,] ) > head(counts[,1:4]) tracking_id gene_id X525_120515_0 X383_120503_0 1 ENSG00000167578.11 ENSG00000167578.11 60.84 65.45 2 ENSG00000242268.1 ENSG00000242268.1 0.08 0.05 3 ENSG00000078237.4 ENSG00000078237.4 4.39 4.49 4 ENSG00000263642.1 ENSG00000263642.1 0.00 0.00 5 ENSG00000225275.4 ENSG00000225275.4 0.00 0.00 6 ENSG00000060642.6 ENSG00000060642.6 5.98 4.66 #Remove Leading X from column names > colnames(counts) <- gsub('X','', colnames(counts)) #Remove Trialing underscore and [0-8] > colnames(counts)[ (colnames(counts) %in% c("tracking_id","gene_id") )==F ] <- substr(colnames(counts)[ (colnames(counts) %in% c("tracking_id","gene_id"))==F ], 1, nchar(colnames(counts)[ (colnames(counts) %in% c("tracking_id","gene_id"))==F ])-2) table( colnames(counts)[ (colnames(counts) %in% c("tracking_id","gene_id") )==F ] %in% FullMeta$specimenID ) TRUE 640 ```
Hi - Unfortunately not. Could you direct me to how the specimenID for file syn21323366 matches the sample IDs on the RNAseq files? I am struggling to see it
Hi @cvanderm This file should have what you need! https://www.synapse.org/#!Synapse:syn21323366
Sure: ROSMAP_clinical.csv = syn3191087 Sample IDs in the above file need to matched to: ROSMAP_RNAseq_FPKM_gene.tsv = syn3505720 ROSMAP_RNAseq_FPKM_gene_plates_1_to_6_normalized = syn3505732 ROSMAP_RNAseq_FPKM_gene_plates_7_to_8_normalized = syn3505724
Do you mind posting the synIDs of the respective files? Thanks!

ID matching: Clinical.csv file to RNA Seq files page is loading…