Hi, There are several hundred duplicated specimenID records in the ROSMAP RNAseq metadata table (https://www.synapse.org/#!Synapse:syn21088596). The duplications have similar but not identical information. E.g. two records for "01_120405" specimenID | platform | RIN | rnaBatch | libraryBatch | sequencingBatch | libraryPrep | libraryPreparationMethod | isStranded | readStrandOrigin | runType | readLength 01_120405 | |7.7 |NA | 2 | nan | polyAselection | nan | TRUE | nan | pairedEnd | 101 01_120405 | | 7.7 |NA | 2 | 2 | polyAselection | | TRUE | | pairedEnd | 101 Can you please recreate a cleaner version? Thanks, Minghui

Created by Minghui Wang minghui.wang
Hi @nicole.kauer @Mette Duplicated records in ROSMAP_assay_RNAseq_metadata.csv I think I have download the updated metadata,but I met the same question. The download command is synapse get syn21088596 ,and Synapse Client version is 1.9.4 Downloaded syn21088596 file md5 value?f4cbf24f60de2794c043e01d7b1c3bd2? is same with the md5 value provided by the datasets. Thanks for your reply!
@zhangliandong, downloading the file without specifying the version should give you the latest version of the file. The clients do have ways to download different versions of files, which can be seen in the documentation ([python](https://python-docs.synapse.org/build/html/index.html#synapseclient.Synapse.get), [R](https://r-docs.synapse.org/reference/synGet.html)). As for the issue mentioned in this thread of duplicated data, this is currently being fixed. Please join the team mentioned above to get updates.
Hi @nicole.kauer ROSMAP RNAseq metadata table (https://www.synapse.org/#!Synapse:syn21088596) has modified in 10/08/2020 8:24 PM I have downloaded this file in 09/09/2020.But I find the md5 value is same numbers. How to get the newest files? downloaded it again? Thanks for your reply!
@jgockley, thanks! We are working on cleaning up the metadata. @zhangliandong, if you join [the AMPAD_DataReleaseUpdates team](https://www.synapse.org/#!Team:3372003), you will get a notification when we release new data, including the updates to ROSMAP metadata.
@nicole.kauer & @Mette The Duplicate issue is back: ``` foo<-read.csv( synapser::synGet('syn21088596')$path, header=T, stringsAsFactors = F) foo[ as.character(foo$specimenID) %in% '492_120515', ] specimenID platform RIN rnaBatch libraryBatch sequencingBatch libraryPrep libraryPreparationMethod isStranded 492_120515 9.100000 NA 0 nan polyAselection nan True nan pairedEnd 492_120515 9.100000 NA 6 nan polyAselection nan True nan pairedEnd 492_120515 9.140738 NA 7 nan polyAselection nan True nan pairedEnd 492_120515 9.100000 NA 0 0 polyAselection True pairedEnd 492_120515 9.100000 NA 6 6 polyAselection True pairedEnd 492_120515 9.140738 NA 7 7 polyAselection True pairedEnd ``` @zhangliandong has a documented issue here: https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=6976
@minghui.wang The metadata file has been updated to have no duplicates. Thank you for bringing it to our attention!
@minghui.wang Thanks for the heads up. We will look into it

Duplicated records in ROSMAP_assay_RNAseq_metadata.csv page is loading…