Downloading Column of Audio Files via Python/Command Line

Hello, I'm an undergrad research student trying to bulk download a column of audio files from a table. This seems simple enough, but I'm having a difficult time understanding the correct "types" of what the syn.get() and other table functions return. I originally tried to download the files over the command line, but switched to Python to more easily document the process. I found from performing: ``` col = syn.tableQuery('select "audio_audio.m4a" from syn5511444', resultsAs='rowset') # ``` I can iterate over each row, where printing each row reveals an object with { rowId, values, versionNumber } as properties, and values is an array containing a single 7-digit number object e.g. 5404521. When I manually download the full table from the web and open it in Excel, I can see values[0] corresponds to the cells where the audio file link should be downloaded. So I thought that this 7-digit number could be the entity value, so I did: ``` for cell in col: syn.get(cell.values[0], followLink=True, downloadLocation='.') ``` which didn't work, as there's no entity associated with that number, so I also tried appending 'syn' to row.values[0], but that didn't work either for the same reason. I really don't understand (despite all the documentation) how to access and download the audio files. I can download it manually when viewing the table on the web, but there are >65000 samples and I feel so close to this automated solution. I think I may be going wrong in my understanding of the audio file cell type? I think it's a Link entity (unsure how to check), but I also saw referenced "file MD5" which may also be relevant? Any help would be greatly appreciated, thank you!

Created by Christina Stanfield cstanf
There's no parameter, so if you need everything to download to a single folder you'll need to set your cache. But when using downloadTableColumns it returns to you a mapping of file handles to file paths. That and some deft use of the [os or shutil modules](https://stackoverflow.com/questions/8858008/how-to-move-a-file-in-python) should move those files to a new location pretty quick. Of course, if any of the files happen to have the same name you'll have problems unless you retain the folder hierarchy. I don't think the synapseclient will be able to find the files in their new location, either (unless you set it as your cache and retain the hierarchy). So if you run the command again it will start downloading everything all over again.
@phil , thank you for your help! This worked wonderfully, and I'm about to repeat this process with a different column of data. Just wondering if there is a parameter I can specify (e.g. for the get() function, it was downloadLocation) to put all the files into a single folder other than the cache. It was time-consuming to manually crawl the subfolders the download process had created in the cache to extract all the files. Or is changing my cache location in the config file the only next-best solution? I appreciate any advice, thank you again so much!
Fortunately there is a preexisting function that can do this for you very easily. It's not yet spelled out in the User Guide as far as I can tell, you would have to sleuth around the API docs to find it. Here's an example: ``` table_id = "syn5511444" file_col = "audio_audio.m4a" import synapseclient as sc syn = sc.login() query = syn.tableQuery("select * from {}".format(table_id)) paths = syn.downloadTableColumns(query, file_col) df = query.asDataFrame() df["path"] = df[file_col].astype(str).map(paths) ``` If you are trying to download a lot of data this may take a while. But if for some reason the process is interrupted you can run these same commands and it will resume from where it left off.

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Downloading Column of Audio Files via Python/Command Line page is loading…