Hi,
I am a little bit confused about the definition of "Feature matrix" in the submission requirements in the mPower data challenge.
Do we have to extract separate features for walking outbound, walking return and rest,
or we just need to extract a set of features for each recordId,
or we need to extract separate features for each file in that recordId (up to 8 files per recordId)?
Thanks!
Created by Tongxin Wang txwang93 @txwang93, I have uploaded a new version of the template. Thanks for catching that error.
Solly Thanks for pointing this out @txwang93. I'll take a look at this and upload an update in the morning if necessary.
@MonteShaffer, yes there are healthCodes which overlap between the training and supplemental training. That is, the supplemental training consists of additional walking tests (from the last 18 months of the study) for individuals already released in the training data set. Thank you for the question. I am just downloading the testing/supplemental so I haven't fully reviewed the sizes of the sets. I believe lack of clarity and "confusion" is par for the course so far with this challenge. Hi Monte,
As far as I see, training set has 34631 recordIds, testing set has 36,664 and supplement training set has 7,873. But they do not add up to the number of rows in the template. So I am confused. Are the heathCode values and recordId values consistent across the three tables. I tried to run the following query, but your system doesn't seem to allow it.
```
SELECT count(*) FROM syn10146553,syn10733842 WHERE syn10146553.healthCode = syn10733842.healthCode
```
It appears there are overlap in the codes. Hi Solveig!
I just noticed that the submission template of sub challenge 1 has 83,648 recordIds, while the three tables we have (training, testing and supplement training) have only 79168 recordIds in total. Could you explain the difference? Thanks! One option might be to impute the missing information.
Yes, you must submit features for the training, supplemental training and test data set recordIds. ##Missing Values
This missing "data requirement" seems odd ... `every recordId with no missing values'
If a training record has only limited information, (e.g., not all responses - I believe a full data set consists of 8 json files), how can I extract the same feature as a full-information record?
That is, if you give me missing values, how can I return something that has features?
## Training, Testing, Supplemental
Also, we create the features for the Training and Testing data? And the Supplemental data?
The training data took 3 days to download using your interface, and requires about 80GB of storage space (for .synapseCache) and an additional 120GB for my internal caching mechanisms (store your cache that is more easy for me to manipulate, build cached binary data objects) with data manipulation. Just trying to understand what the requirements are. Looks like upwards of 1TB of storage for the three sets.
Training: 34631 ... SELECT count(*) FROM syn10146553
Testing: 36664 records ... SELECT COUNT(*) FROM syn10733842
Supplemental: 7873 ... SELECT count(*) FROM syn10733835
That is correct! Hi. Thanks!
Just to be clear, we only need to submit a set of features for each recordId?
For example, I could choose not to extract any features from walking outbound activity, as long as I have the same, non-empty set of features for each recordId? Features may be extracted from any or all of the files available for a given recordId. The only requirements are that you provide at least one feature and that you extract the same feature(s) for every recordId with no missing values.
Drop files to upload
Questions about feature matrix in the submission requirements page is loading…