####Study-related questions
Q: It seems that there is a low correlation between the labels and the observations. One noisy observation has a low tremor score while one does not. What does that mean?
A: Self-report data tend to be noisier than clinician-rated data. Also, context may matter given that these are free living data. For example, there may be a difference in how apparent tremor is while walking versus sitting.
Q: For CIS-PD, was typical dose of dopaminergic medications given before ON exam at 2 weeks or was a supratherapeutic dose given, which is sometimes done?
A: To the best of our knowledge, they received the typical dopaminergic medications and not the supratherapeutic dose.
Q: Does L-dopa equivalent dose is available for each subject?
A: We do not have additional information about their medication regiment.
Q: Some REAL-PD data has dyskinesia values but not on/off and vice-versa. How is that possible?
A: We have deliberately masked the values for some variable/subject combinations which will not be included in the evaluation.
Q: Can the CIS-PD timeseries information be taken as correct, i.e. without technical faults by the device? There seem to be strange "permanent offsets" in the data that don't seem right. What I meant was that there seem to be larger periods of accelerometer data that are not centered around 0. This seems odd for acceleration data unless you spend your day in an elevator. Do you have technical information/data on generic output of these devices?
A:These data have not had the acceleration of gravity subtracted, so even completely at rest there will be non-zero acceleration.
####Characteristics of the data
Q: Data missing problem: Subject 1046 in CIS-PD, (hbv012, hbv017, hbv018, hbv023, hbv054) in REAL-PD completely are missing medication ON/OFF label, will these subjects be evaluated in the testing set for medication ON/OFF?
A: No, we will not be evaluating variable/subject combinations which are completely missing in the training data.
Q: Since there is a difference in scales with medication and dyskinesia between CIS and REAL-PD should we try to put them in the same scale or should we build separate models?
A: We leave it up to you. We have our own hypotheses about the nature of the scales, but you might come up with a better idea. We will be evaluating the data on the scales provided, however, so whatever choices you make, be sure you scale your predictions back to the original scales.
Q: What does ?ancillary? exactly mean?
A: This is data on subjects who do not appear in the test set but may prove to be useful for training anyway. We have provided them separately to distinguish them from subjects who will be included in the evaluation.
Q: The observations (measurements_id) are not ordered in any sequential order?
A: No, we?ve shuffled them so there is not any information that can be gained from measurement_id values.
Q: Are the self assessment (labels) reported in the middle of the observation segment?
A: Yes, however based on differences in how this was done across data sets. For CIS-PD subjects reported their symptoms for a fixed time, say 8:00am. In this case we provided sensor data for 10 minutes before through 10 minutes after this time. For REAL-PD, they reported their symptoms for a 30-minute time window, for example 8:00-8:30am. In this case we have provided the sensor data for the middle of the window (i.e. 8:05-8:25).
####Missing data or requests for new data
Q: I already downloaded the data. You said the data is going to be available following the webinar. Is it new data that you are going to make available in brain commons following today?
A: The test data will be released today. These are the data that you will predict on. You?ll need both the training data, which is already available, and the test data for the purposes of the challenge.
Q: Can the ?Time since PD? and ?Hoehn & Yahr Stage? information about the participants in REAL-PD be shared with us?
A: Hoehn & Yahr Stage is one of the questions in UPDRS Part III, which you have already been provided in both the On and Off state. We?ll look into whether we can provide ?time since PD?
Q: Will we have the UPDRS III scores for the testing data?
A: There are no subjects in the test data, which are novel, so you already have the complete UPDRS data.
Q: Can we use additional data (from other sources) to train our models?
A: Yes, but you will need to share those data with us in order for us to reproduce your model, so you may only use data which we (the organizers) will be able to access.
Q: Can we augment our model using a dataset which is not fully public, but is available for research groups approved by a biobank?
A: Yes, as long as it is possible for the challenge organizers to access the data. The same conditions apply as to the challenge data. Any code must be runnable from the version of the data we can access, and therefore must contain all data processing steps.
Q: Is column subject_id is included in test data?
A: Yes, you?ll have a mapping between subject_id and measurement_id for the test data.
####Model generation
Q: Can we use pre-trained model?
A: You can use pre-trained models, as long as you can provide the code and data used to train them.
Q: Can dyskinesia/tremor information be used to predict medication ON/OFF or vice versa?
A: You will not receive any label data for the test portion, so it won?t be possible to directly predict the labels from one variable from the labels from the other variables. However, you may use the labels provided in the training data in any way that you like.
Q: Is it possible to change the segmentation format? For instance by reducing the length of the windows or by changing overlapping?
A: You can do whatever you want with the data provided, including sub-segmenting the windows, but you will not be provided with any additional overlapping data.
####Submissions
Q: Is there any size limit for model submission? If yes, what is the limit size?
A: I?m not totally sure what was meant here. The prediction submissions files will be of a relatively fixed size. If you?re referring to memory requirements for your model building, there is no specific limit, but it will need to be runnable on an available AWS server type. If this doesn?t answer your question, please follow up on the discussion forum.
Q: Should we submit our real models? Or you need just csv predictions?
A: You will submit your predictions as a csv file as described in the webinar and the soon-to-be released wiki pages. You?ll also need to submit the code to build your models.
Q: Is MATLAB code acceptable
A: Yes
Q: To clarify, any coding language is okay to use?
A: Yes, though we do have a preference for open-source languages if possible.
Q: Do we have to submit the write-up for every submission or only in the last one?
A: You only need to document your final solutions for each subchallenge.
Q: Can individuals share basic information about their model's performance in a round, so long as there is no attempt to reverse-engineer the test data?
A: We prefer you keep this information private.
####Other
Q: Are coding problems to be addressed in discussion panel as well?
A: We?re not totally sure what you mean by ?coding problems?. You may use the discussion forum to get help from us (the challenge organizers) for issues related to the data or clarification on any issues related to the challenge, questions, metrics, etc. You may also use the forum for discussion with other participants on topics related to the challenge. If this doesn?t answer your question, feel free to follow up.