statistcial significant of the given small test data size

Hi, Solveig Sieberts :) I'm having trouble figuring these out. 1. There seems to be staggering high probability of someone winning by chance given the small test size (23 binary variables for the 1st sub-challenge). Will there be additional actions to make sure the competition results are statistically sound? 2. will prediction be scored only after all phases of the competition? If not, how to prevent someone from using old predictions from earlier phases which got them a high score? But if predictions are scored only after the competition, will there be feedback systems like leader-boards? Not trying to complain here, just wanting to make sure this is a competition on generating good models and not signing up for lottery . ;) cheers, Sanders Lin (NCKU, Taiwan)

Created by sanders lin santo0520
medhini-- https://www.synapse.org/#!Synapse:syn5647810/wiki/399104, in the section '6/8: Code available for published analyses on the Challenge data.' good luck to your submission!
@gyuanfan Could you mention which publications you were referring to in your comment? Have models already been built on this data and has work been published ? Are there any relevant publications mentioned on the challenge site ?
Yuanfang- It is indeed an open question whether better-than-random predictors can be built at earlier timepoints. This was precisely the reason for including a weighting scheme which effectively ignores timepoints in which none of the predictions are better than random. For later timepoints, we have been able to develop better-than-random predictions. We're hoping that using the widsom of the crowd can improve upon our somewhat simple models, and identify gene expression signatures which predict illness and severity. Again this is an open question the degree to which this is possible, and we don't expect that these gene expression-based models will be immediately good enough for clinical actionability, but instead a jumping off point for further biomarker development. As you pointed out, sample size is limited, so methods that are reliant on large samples may not be the best approach in this case. Solly
Thank you so much for the timely reply. Its comforting to know that the organizers have thought of these issues thouroughly. This is my first competition so I was a bit cautious, thanks. Sanders Lin
i think that is a good point. i have run all of our previous winning algorithms, like a dozen of them, over this dataset. and all of them are able to get an auroc between 0.1 to 0.9 with different seeds of splits. so i am facing the problem of finalizing with which one. but to be frank all of them are not that promising, so i am now implementing the classifier in the original publication, since they claimed 90%-100% accuracy. i think when the sample is too small, it is difficult to do any statistics. take an extreme example, if i have only 1 example which is true. then model A predicts 1, model B predicts 0, then no matter how many times you bootstrap, the bayes factor will be infinity. can you please comment on this? i am posting one cv results of one example, with exactly the same classifier but different seeds. can you please take a look and make some suggestions? thanks a bunch in advance for your advice. this is auroc, whose random is always 0.5. 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1458 0.1562 0.1979 0.2083 0.2083 0.2292 0.2396 0.2396 0.2396 0.2396 0.2500 0.2500 0.2500 0.2593 0.2593 0.2593 0.2593 0.2708 0.2727 0.2867 0.2867 0.2963 0.2963 0.2963 0.2963 0.3007 0.3125 0.3147 0.3229 0.3287 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3427 0.3497 0.3500 0.3566 0.3566 0.3566 0.3636 0.3636 0.3704 0.3704 0.3704 0.3704 0.3750 0.3750 0.3750 0.3750 0.3750 0.3750 0.3750 0.3750 0.3776 0.3788 0.3846 0.3916 0.3956 0.3958 0.3958 0.4000 0.4000 0.4000 0.4062 0.4066 0.4074 0.4091 0.4091 0.4091 0.4242 0.4250 0.4250 0.4250 0.4250 0.4250 0.4266 0.4286 0.4286 0.4286 0.4286 0.4394 0.4394 0.4396 0.4396 0.4406 0.4406 0.4444 0.4444 0.4444 0.4479 0.4479 0.4500 0.4500 0.4500 0.4505 0.4545 0.4545 0.4545 0.4545 0.4545 0.4545 0.4545 0.4583 0.4615 0.4615 0.4697 0.4725 0.4750 0.4750 0.4750 0.4755 0.4755 0.4755 0.4815 0.4815 0.4815 0.4815 0.4815 0.4835 0.4835 0.4835 0.4895 0.4895 0.4896 0.4896 0.4945 0.4965 0.5000 0.5000 0.5000 0.5000 0.5000 0.5035 0.5035 0.5104 0.5104 0.5104 0.5105 0.5105 0.5165 0.5165 0.5165 0.5175 0.5185 0.5185 0.5185 0.5185 0.5185 0.5185 0.5185 0.5208 0.5208 0.5250 0.5250 0.5275 0.5275 0.5275 0.5312 0.5315 0.5385 0.5385 0.5385 0.5385 0.5417 0.5455 0.5455 0.5455 0.5495 0.5495 0.5521 0.5521 0.5524 0.5556 0.5556 0.5556 0.5556 0.5556 0.5594 0.5625 0.5714 0.5714 0.5714 0.5729 0.5734 0.5734 0.5750 0.5750 0.5750 0.5804 0.5804 0.5824 0.5824 0.5824 0.5824 0.5824 0.5824 0.5926 0.5934 0.6000 0.6154 0.6224 0.6250 0.6250 0.6264 0.6296 0.6296 0.6364 0.6364 0.6374 0.6500 0.6515 0.6515 0.6667 0.6667 0.6750 0.6750 0.6750 0.6783 0.6818 0.6853 0.6923 0.6970 0.7000 0.7000 0.7000 0.7121 0.7273 0.7273 0.7424 0.7424 0.7424 0.7576 0.7576 0.7576 0.7576 0.7727 0.7879 0.7879 0.8182 0.8333 0.8485 0.8485 0.8485 0.8636 0.8636 0.8788 0.8788 0.8788 0.8788 0.8939 0.9091
Thank you for your questions. 1. Yes, DREAM challenges always perform a thorough analysis of the submissions before declaring winners to make sure the winning entry is statistically distinct from the remaining entries. In some cases, multiple winners have been declared when the solutions are not statistically distinct. 2. There will be some feedback between rounds, including rank ordering of solutions relative to the benchmark (challenge organizer's) model, and a binary indicator of whether the entry is statistically better than random at 0.05 p-value. For rounds 2 and 3, we will also report whether the entry improved upon the previous scores. Actual scores and p-values will not be reported until the winners are announced to prevent any chance of using this information to learn on the test data. Also, it is within the rules to use the same solution for multiple rounds if a participant wishes to do so. I hope that addresses your concerns.

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

statistcial significant of the given small test data size page is loading…