... from biospecimen (syn18345334) and individual (syn18345335) metadata files Here are the specimens that I could not find: 58, 44, 42, 57, 46, 35, 54, 56, 30271, 30269, 30270, 251, 116 I also noticed that the processed data for this study, specifically htseqcounts_APTR.txt (syn22107627) contains 234 specimens. But only 144 BAMs are provided on synapse. Do some BAMs contain reads from more than one specimen? Thank you again for all your help.

Created by Rached Alkallas ralkallas
Hi all. I encountered similar issues. Is there any update on this issue? Thanks.
Hi @ryaxley , Have you heard back from the JAX team regarding this? Thank you!
Thank you, I also noticed that in addition to the samples I mentioned above, many mice seem to be missing genotypes: ``` > si2[is.na(genotype)][order(individualID), ] individualID specimenID genotype genotypeBackground study 1: 3 3rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 2: 8 8rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 3: 12 12rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 4: 25 25rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 5: 27 27rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 6: 30 30rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 7: 123 123rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 8: 181 181rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 9: 191 191rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 10: 193 193rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 11: 221 221rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 12: 225 225rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 13: 239 239rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 14: 242 242rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 15: 252 252rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 16: 272 272rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 17: 299 299rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 18: 344 344rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 19: 347 347rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 20: 364 364rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 21: 367 367rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 22: 393 393rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 23: 394 394rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 24: 399 399rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 25: 400 400rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 26: 20246 20246rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 27: 20248 20248rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 28: 27168 27168rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 29: 27354 27354rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 30: 27355 27355rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 31: 27356 27356rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 32: 27357 27357rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 33: 28129 28129rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 34: 288810708 288810708rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 35: 289457705 289457705rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 36: 289461928 289461928rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 37: 289470196 289470196rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 38: 289478142 289478142rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 39: 289482201 289482201rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 40: 289494346 289494346rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 41: 289535121 289535121rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 42: 289576914 289576914rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 43: 289666353 289666353rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H 44: 289674340 289674340rh C57BL6J Jax.IU.Pitt_APOE4.Trem2.R47H ``` If it makes it easier for you to narrow down these samples in your own records, here is my R code, which takes as input the directory containing the fastq and metadata files for this study: ``` library(data.table) library(magrittr) experiments <- system('find /scratch/user/models -name "*fastq*"', intern = T) %>% sort experiments <- experiments[ -grep('single cell RNA seq', experiments) ] experiments <- split( gsub('(.+/)|(_R(1|2)_001.fastq.gz|_001_R(1|2).fastq.gz|_R(1|2).fastq.gz)', '', experiments), sapply(strsplit(experiments, '/'), '[', 5) ) %>% lapply(., unique) Jax.IU.Pitt_APOE4.Trem2.R47H <- lapply(system('find /scratch/user/models/Jax.IU.Pitt_APOE4.Trem2.R47H/Metadata -name "*.csv" | egrep "RNA|biospecimen|individual"', intern = T) %>% setNames(nm = .), fread) names(Jax.IU.Pitt_APOE4.Trem2.R47H) <- names(Jax.IU.Pitt_APOE4.Trem2.R47H) %>% gsub('^.+/', '', .) %>% gsub('\\.csv$', '', .) %>% gsub('Jax.IU.Pitt_APOE4.Trem2.R47H_|_metadata|asssay_', '', .) # > Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen[ , table(specimenID %>% gsub('[0-9]+', '', .), tissue), ] # tissue # right cerebral hemisphere serum # 0 428 # rh 406 0 all(paste0(Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID, 'rh') %in% Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen$specimenID) sum(paste0(Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID, 'rh') %in% Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen$specimenID) Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID[!paste0(Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID, '') %in% Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen$specimenID] Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID[!paste0(Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq$specimenID, 'rh') %in% Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen$specimenID] # [1] 58 44 42 57 46 35 54 56 30271 30269 30270 251 116 JIP_APOE4.Trem2.R47H.sampInfo <- merge(Jax.IU.Pitt_APOE4.Trem2.R47H$individual, Jax.IU.Pitt_APOE4.Trem2.R47H$biospecimen, by = 'individualID', all = T) Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq[ , specimenID.ori := specimenID, ] Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq[ , specimenID := paste0(specimenID, 'rh'), ] JIP_APOE4.Trem2.R47H.sampInfo <- merge(JIP_APOE4.Trem2.R47H.sampInfo, Jax.IU.Pitt_APOE4.Trem2.R47H$RNAseq, by = 'specimenID', all = T) # intersect with rna sample names experiments$Jax.IU.Pitt_APOE4.Trem2.R47H extracted.id <- paste0(gsub('_.+$', '', experiments$Jax.IU.Pitt_APOE4.Trem2.R47H), 'rh') extracted.id %>% duplicated %>% sum JIP_APOE4.Trem2.R47H.sampInfo[ , all(extracted.id %in% specimenID), ] experiments$Jax.IU.Pitt_APOE4.Trem2.R47H <- setNames(extracted.id, experiments$Jax.IU.Pitt_APOE4.Trem2.R47H) JIP_APOE4.Trem2.R47H.sampInfo[!is.na(specimenID), genotype.filled := unique(genotype[genotype != '']) %>% na.omit, by = .(individualID), ] si2 <- JIP_APOE4.Trem2.R47H.sampInfo[ specimenID %in% experiments$Jax.IU.Pitt_APOE4.Trem2.R47H, .(individualID, specimenID, genotype = genotype.filled, genotypeBackground, study = 'Jax.IU.Pitt_APOE4.Trem2.R47H') ] %>% unique si2[is.na(genotype), ][order(individualID), ] ```
Thank you for the clear explanation of the issue. I will contact the JAX team and inquire about the missing metadata and BAM files. Rich

Specimens in Jax.IU.Pitt_APOE4.Trem2.R47H mouse study RNAseq metadata file (syn18345333) are missing... page is loading…