Hello,
We want to verify the integrity of files? Could you kindly provide a list of MD5 checksum for all files in each population?
Thank you very much for your assistance.
Created by XIAO FENG fengx25 Hi all,
I created an Entity View for you here: https://www.synapse.org/#!Synapse:syn53038826/tables/. There is column with the md5s in it for every file in this project. You can query this table for files in different folders. See this for more information: https://help.synapse.org/docs/Querying-Tables,-Views,-and-Datasets.2667642897.html. Hi, Thanks a lot for your help. Hi,
I could not find a solution to retrieve MD5 checksums in bulk directly. However, using the Synapse R client (synapser, see here: https://r-docs.synapse.org/articles/synapser.html), you can create the list using this code:
```
#### GET MD5s ----
library(synapser)
library(dplyr)
synLogin(email = "xxx@yyy.zzz", password = "YourPW") # Will give you a warning, as Synapse will be switching to use Synapse Personal Access Tokens to login
# AFR population as example
folderContent <- synGetChildren("syn51365304")
folderContent <- as.list(folderContent)
folderContent <- lapply(folderContent, function(x){
file <- synGet(x$id, downloadFile = FALSE, downloadLocation = getwd())
data.frame(ID=x$id, filename=x$name, MD5=file$get("_file_handle")$contentMd5)
}) %>% bind_rows()
```
This will give you an output like this:
```
> head(folderContent)
ID filename MD5
1 syn52320545 A1BG_P04217_OID30771_v1_Inflammation_II.tar 74223c794c427802dff7b77ec7700569
2 syn52314855 AAMDC_Q9H7C9_OID30236_v1_Cardiometabolic_II.tar 1a8feffa76349a292b920b6a3e62a3cb
3 syn51499879 AARSD1_Q9BTE6_OID21311_v1_Oncology.tar f59857f0dc21a3164126c5c1765bd9e4
4 syn52310099 ABCA2_Q9BZC7_OID30146_v1_Cardiometabolic_II.tar 1d58f6366fa4cb5f47f0fa680769b1f7
5 syn51499342 ABHD14B_Q96IU4_OID20921_v1_Neurology.tar caa3ed927098ee58b3ca61aba1ba57c6
6 syn51497755 ABL1_P00519_OID21280_v1_Oncology.tar f295705d18fca29bfef042d0c9fd0152
```
Hope this serves as an OK workaround. [Not an answer]
Hi UKB-PPP team,
I would also like to verify the integrity of files, and as an idea I believe it could be incredibly useful to include the MD5 checksum's as part of the output from using the `list` argument when paired with `--long` and/or `--modified` flags, if possible.
Thank you in advance and thank you for the wonderful resource!