The single cell file (ROSMAP_Brain.snRNAseq_counts_sparse_format_20201107.csv) is not in mtx format. It only has a csv file. How can I properly build the count matrix in R?

Created by ali.karimnezhad
Hi @abby.vanderlinden, I am not able to get a single line of code to work with these files. The code provided by @ali.karimnezhad does not work for me and I am having a tremendous amount of trouble getting started. To simply load the files into RStudio, how do we do this given these exact files provided to us? The barcode file especially does not seem to be recognized. Here is the code I am using: ## Code ## counts <- read.csv("ROSMAP_July/DATA/DLPFC_Study/ROSMAP_Brain.snRNAseq_counts_sparse_format_20230420.csv") genes <- read.csv("ROSMAP_July/DATA/DLPFC_Study/ROSMAP_Brain.snRNAseq_metadata_genes_20230420.csv") cells <- read.csv("ROSMAP_July/DATA/DLPFC_Study/ROSMAP_Brain.snRNAseq_metadata_cells_20230420.csv") data_dir <- my directory list.files(data_dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx expression_matrix <- Read10X(data.dir = data_dir) **# ERROR MESSAGE**: Error in Read10X(data.dir = data_dir) : Barcode file missing. Expecting barcodes.tsv.gz** seurat_object = CreateSeuratObject(counts = expression_matrix) expression_matrix <- ReadMtx( mtx = "count_matrix.mtx", features = "features.tsv", cells = "barcodes.tsv" ) **# ERROR MESSAGE: ** Error: Cannot find expression matrix at count_matrix.mtx In addition: Warning message: In normalizePath(path = all.files[[i]], winslash = "/") : path[1]="count_matrix.mtx": No such file or directory
Hi thanks for getting back to me! It turns out that the problem was with counts <- fread( synapser::synGet('syn23554292')$path ), as this was weirdly not reading the object correctly and missed cells. I solved my issue by downloading the file first and reading it like this : mat <- read.csv(gzfile("counts.gz"))
Oh, syn23554292 shows as modified on 10/18/2021 because I changed some of the annotations (e.g. entity metadata). The underlying file has not changed, because there is still only one version. If the file changed there would be multiple versions with different md5 checksums. However, I know that doesn't help answer the question of the missing cells. @vilasmenon uploaded the file originally -- Vilas, can you help answer Eléonore's question above? Thanks!
Hi, It looks like the files have been modified on the 10/18/2021, after the previous messages were posted on the 06/08/2021. I check the nuclei >=400 and it doesn't change the problem that cell annotation file has 162,767 while A has fewer cells (137,136) ? How are we supposed to annotate the cells with a bigger annotation file..? Thank you Eléonore
Hi @ems2817 , The counts file has not been updated. Is it possible that this paragraph from the methods description would explain the discrepancy? > Data processing: Generating count files from the raw sequence data requires the use of the Cellranger software package available from 10x Genomics. Counts tables from the cellranger count output were concatenated, and only nuclei >=400 detected genes were retained in the final table. Instructions on generating the counts tables in Cellranger from the raw sequence data is below.
Hi, following the steps described above to read the count matrix, I noticed that the Meta.cells csv has 162,767 cells, whereas A has 137,136 cells. Essentially, It is impossible to trace back the column to a cell. Has the count file (syn23554292) been updated (i.e. filtered) since the the previous messages were posted? If yes, I would need an up to date Meta.cells (syn23554294) to be able to match cells. colnames(A)=Meta.cells$cell_name Error in dimnamesGets(x, value) : length of Dimnames[[2]] (162767) is not equal to Dim[2] (137136) Thank you!
Hi @ali.karimnezhad , That is indeed very confusing! The specimenIDs in the "ROSMAP_Brain.snRNAseq_metadata_cells_20201107.csv" ([syn23554294](https://www.synapse.org/#!Synapse:syn23554294)) do appear in the [ROSMAP biospecimen metadata file](https://www.synapse.org/#!Synapse:syn21323366), but unfortunately the cellType column is blank. It appears we do not have any other metadata on the identified cell types of those clusters. I would recommend contacting the authors of the paper and asking how the Neuron I, II, and III categories in the paper correspond to the broad_class and subtype columns in this file -- sorry I can't be more help here. Abby
Hi @abby.vanderlinden, The authors of this [paper](https://www.nature.com/articles/s41467-020-15816-6) refer to the above dataset, and mention that they identified 11 clusters: 3 neuronal subtypes, 2 interneuronal subtypes, 2 astrocyte subtypes, oligodendrocytes, oligodendrocyte progenitor cells, and microglia. I wonder how one can see these cell types in the available meta file they refer to (syn23554294). The meta data file (syn23554294) includes two columns related to cell type labels (broad_class and subtype), but none of these completely match with the labels appeared in the paper. For example, in the paper they have Neurons I, Neurons II, Neurons III cell types labels, but I am not seeing them. Perhaps "Exc" labels in the meta data refer to Neuron ones. If that's true, I expect to see something like "Exc.1", "Ex.2" and "Ex.3", but this is not the case. Do you know how I can match the same exact call types? Here I provide more details about cell type labels present in the meta data file. Thanks. synapser::synLogin() Meta.cells <- read.csv( synapser::synGet('syn23554294')$path, header=T, stringsAsFactors = F) unique(Meta.cells$broad_class) "Exc" "Inh" "None" "Peri" "Olig" "Astr" "Micr" "Endo" "OPC" unique(Meta.cells$subtype) [1] "Exc.Exc.L3" "Inh.Inh.SST" "Exc.Exc.L6.IT.THEMIS" [4] "Exc.Exc.THEMIS.Car3" "Inh.Inh.VIP" "Exc.Exc.RORB_L5_IT_1" [7] "Exc.Exc.L6.FEZF2" "None.NA" "Exc.Exc.L5" [10] "Peri.NA" "Olig.3" "Astr.4" [13] "Micr.3" "Endo.1" "OPC.NA" [16] "Astr.1" "Micr.6" "Olig.1" [19] "Olig.4" "Olig.2" "Exc.Exc.RORB_L5_IT_2" [22] "Exc.Exc.FEZEF2.L5.ET" "Exc.Exc.FEZF2.NP" "Astr.2" [25] "Astr.3" "Astr.5" "Inh.Inh.PVALB_not_chandelier" [28] "Inh.Inh.Chandelier" "Inh.Inh.LAMP5.LHX6" "Inh.Inh.ADARB2.LAMP5.1" [31] "Inh.Inh.ADARB2.NOT.VIP" "Micr.2" "Micr.5" [34] "Micr.1" "Micr.4" "Endo.2" [37] "Endo.4" "Endo.3"
Oh, good, I'm glad you were able to figure it out to your satisfaction! Let me know if you run into any other issues.
Hi @abby.vanderlinden, Thanks for your help. It seems that I was finally able to read the count matrix in R. Here is the code I used: library(data.table) synapser::synLogin() counts <- fread( synapser::synGet('syn23554292')$path ) i=as.vector(counts$i) j=as.vector(counts$j) x=as.vector(counts$x) library(Matrix) A <- sparseMatrix(i, j, x = x) Meta.cells <- read.csv( synapser::synGet('syn23554294')$path, header=T, stringsAsFactors = F) Meta.genes <- read.csv( synapser::synGet('syn23554293')$path, header=T, stringsAsFactors = F ) rownames(A)=Meta.genes$x colnames(A)=Meta.cells$cell_name
Hi @ali.karimnezhad, I will look into this and get back to you!

syn21589957: Mtx file missing page is loading…