Hello,
I have a problem about how to match the downloaded audio files with the audio codes in the column "audio_audio.m4a" in the voice activity table.
The downloaded audio files are automatically named as its linkage, for example "audio_audio.m4a-98b7d4d1-dfb2-400c-8dc9-6cd1b1b742735399897901662910740.tmp". However, the voice activity table in CSV format doesn't keep the linkage in the column "audio_audio.m4a", instead, it uses a code to represent each audio, for example "5404521". Therefore, I can't match the audio file to the relevant row in the table.
Is there any way to keep the linkage names in CSV table or to make the audio files named as the codes? Or a good way to match the linkages and the codes?
Created by Shuotong Feng FST Hi @tomerz
Glad you got it working. A safer approach that also works in Python 2 (where dictionaries are not ordered is to map the file identifiers to the local path. The code below does this:
```
import synapseclient
import json
syn = synapseclient.login()
voice_table = syn.tableQuery('SELECT * FROM syn5713118 LIMIT 20')
df = voice_table.asDataFrame()
#Now download the countdown audio files
fileMap = syn.downloadTableColumns(voice_table, 'audio_countdown.m4a')
fileMap
# As you notice this is dictionary that maps the values in the 'audi_countdown.m4a'column to the filepaths of downloaded files you can replace the
# the values with the file paths with:
df = (df
.astype({'audio_countdown.m4a':str})
.replace({'audio_countdown.m4a':fileMap}))
```
I did look at the code, ran into a little trouble adjusting it to fit the Voice Activity at first,
but managed to fix it now :)
thanks for the help!
if anyone else needs it, this is the code I used:
```
# QUERY THE mPower PROJECT (syn4993293) FOR ALL OF THE TABLES
tables = syn.getChildren('syn4993293', ['table'])
tables = [t for t in tables if t['name'].startswith('Voice')]
# DOWNLOAD 20 OBSERVATIONS FROM EACH OF THE TABLES (RESULTS ARE CACHED LOCALLY)
allData = {table['name']: syn.tableQuery('SELECT * FROM %s' % table['id']) for table in tables}
#print(allData)
# EXTRACT THE TAPPING ACTIVITY TABLE AS A DATA FRAME AND LOOK AT THE FIRST TWO OBSERVATIONS IN TAPPING TABLE
df = allData['Voice Activity'].asDataFrame()
print(df.head(2))
# FOR TABLES WITH COLUMNS THAT CONTAIN FILES, WE CAN BULK DOWNLOAD THE FILES AND STORE A MAPPING
# THE VALUE IN THE TABLE ABOVE IS CALLED A fileHandleId WHICH REFERENCES A FILE THAT CAN BE
# ACCESSED PROGRAMMATICALLY GET THE FILES THAT CONTAIN SCREEN TAP SAMPLES FROM THE TAPPING EXERCISE
# THIS CACHES THE RETRIEVED FILES AS WELL
voiceMap = syn.downloadTableColumns(allData['Voice Activity'], "audio_audio.m4a")
df = pd.DataFrame()
# Extract the keys and values from voiceMap
keys = list(voiceMap.keys())
values = list(voiceMap.values())
# Add the keys and values to the DataFrame as separate columns
df['file_handle'] = keys
df['file_name'] = values
# Save the dataframe to a CSV file
df.to_csv('voiceMap.csv', index=False)
```
Hi @tomerz
the return value of the `syn.downloadTableColumns` function is a python dictionary (if you are using Python) where the keys are the filehandleIds stored in the column you downloaded. Have you looked at the example code we provided for downloading data:
Python: https://github.com/Sage-Bionetworks/mPower-sdata/blob/master/examples/mPower-bootstrap.py
R: https://github.com/Sage-Bionetworks/mPower-sdata/blob/master/examples/mPower-bootstrap.R hey, I just downloaded the voice data and I still have this problem, don't really understand what you mean by map, I didn't get one.
I used a code from the discussions to download all the data automatically.
is there a way to tell which audio file belongs to which row in the tables? perhaps by the health code or something...or maybe download the voice activity table with the original links titles for the 'audio_audio.m4a' tab? or maybe upload the map if possible?
thanks. The synapseclient maintains a cache of already downloaded files. As long as you have not moved the files locally you will not redownload files again if you call the download function. It will however check with Synapse to see that the files you have locally are the latest version and download anything that has changed since you downloaded last. The return value of the function will be the same as on the first call however so you will get back the mapping.
If you have moved the files it gets harder however but can still be done by looking in the cache folder.
Hi,When I downloaded the walking activity file, I forgot the map that will be returned (the relationship between filehandleId and the downloaded filename) so I wondered if there was a way to get the map directly without having to re-download the data. Of course ,I would appreciate it if others could provide me with this map directly Glad it was resolved. Also when you download the files you get back a map that contains a link between the filehandle Id (the id of the file) and its file path. You can merge this with the table data. The problem has been solved. If downloaded to the default path, each audio file will be saved in a folder named the code
Drop files to upload
How to match the downloaded audio with the audio code in the VoiceActivity table? page is loading…