[A Poll] Independent Test Data Sets

Participants- We have identified 2nd independent test data set. It is currently being processed, and we anticipate it's availability in mid-November. Because the first test data set is currently in-house, we could potentially release those data earlier (in about 2 weeks). That leaves 2 options for data release: (1) Release the two data sets separately (set 1 in mid-October, and set 2 in mid-November). This would require two submission cycles, and 2 different sets of submission queues (6 queues in total, which could lead to confusion). (2) Release both sets in mid-November, and require a single set of joint submissions. We would love your feedback on which option would be preferable to you, and which option would leave you most likely to participate. Thanks, Solly

Created by Solveig Sieberts sieberts
Hi Solly: No problem. Just wanted to make sure I was up to date. I still prefer to wait for the release of all test data sets at one time but that may be only my preference. Thanks for the update.
Hi Chi- Sorry for the long delay in communication. You didn't miss an announcement. We've been trying to work on data quality issues with the newer (RNAseq) data set, and it appears there's still more work to do on that front. We can release the 1st data set for participants to work on this month if that's of interest. Solly
Hi Solly: Any new news on the independent test data sets? Did I miss an announcement?
Great! Thanks!
Awesome!!! I can't wait to start working with it, I know what i am doing this weekend!! (Going to the pub, i will play with the data on Monday)
The leaderboard outcome data have been made available in syn7416488.
Thanks for your feedback. It sounds like option 2 is the most popular. We're checking with the Organizing Team about whether we need to hold back the leaderboard outcomes any longer, but we hope to be able to release those in the coming week.
Second option
option 2.
I would prefer option 2.
In that case I would choose Option 2.
No. Due to the agreements with the data contributors, we are unable to release the outcomes data for either test set.
How many subjects will be on each of these data sets? If we go with Option 1, will you release the output labels for the first one shortly after the first submission cycle so that we can use them as training data for the second submission cycle?
@ebongen We will measure performance separately in each data set, as well as combining for an overall score. Neither option will alter the scoring/ranking approach. We are looking for preferences from the participants in terms of the logistical considerations. Thanks.
It's probably best to discuss with a statistician about whether we should take Option 1 or Option 2. I'm leaning towards Option 1, because if the two datasets originate from separate studies, I feel most comfortable judging their results separately. Perhaps average AUC across the two test datasets could be used for the final ranking of models. And I would also like the labels of the leaderboard samples released as soon as possible.
Ah, it appears that i have misunderstood. I thought set 1 was the leader board set, so set 1 and set 2 are both new? I do not mind when these sets get released, however, i would like access to the the clinical outcomes of the leaderboard set as soon as possible.
Okay, thanks. Option 2 for me.
Chi- I'm referring to two new sets of _independent_ test data on which we will evaluate models, not the leaderboard set you have previously been working with. We had previously announced that we had secured the first of these, and we have recently secured the use of the 2nd. These are not data you've seen, and they are not publicly available. Solly
I'm a little confused; do you mind clarifying? Have you obtained two new independent test data sets (for a total of 3 now) or one new test data set (for a total of 2 now)? If it's the latter, I'm confused why you would RE-release the first test data set in mid-November? And would our original predictions and scores be invalidated as a result? Thanks
Option 2. Thanks!
Option 1. But can you please the leaderboard output data as soon as possible to optimize our models? Thanks!
I would prefer option 2. Thanks.
I would prefer option 1, I would like to start working with set 1 as soon as possible. Thanks

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

[A Poll] Independent Test Data Sets page is loading…