Dear All Participants,   We are now live with the submission and scoring system. Please see the [Submitting Models](https://www.synapse.org/#!Synapse:syn6131484/wiki/402044) section for details on how to submit. You will get an email reply with your scores and the leaderboards will be updated. Please allow 15-20 minutes for a reply email. If you have questions or issues with submitting/scoring please post on the Discussion forum.   Best of luck to everyone.   Happy Solving!!!

Created by James Costello james.costello
so this is a lesson very suitable for me at this stage. i am sure every organizer must be thinking the same thing right now. i bet with my student that this scoring must be wrong and that without me no one in the lab can ever submit anything useful. i gave him a 1 to 10 bet that if i lose i pay him $10,000, and if he loses he pays me $1,000. he was so confident that he is right and insisted to enter the challenge on his own. but he back off the bet before he see the eventual conclusion, because he said i am a master in these things and i cannot be wrong. it turns out that i can be as wrong or wronger as everyone else. i think as such a bad advisor i am going to buy him some small gift instead.
you are right. i got confused with false positive rate. and i have been this wrong for 10 years.
used my student's id, sorry! (yuanfang)
Dear Yuanfang, sorry to interfere, but I think this is not fully correct, or I missed something general about the definition of FDR in this case. Assume we have a random classifier and twice as many negatives as positives. Then we will reach for every recall value TP/(TP+FN) a precision TP/(TP+FP) of 1/3, because for every true positive prediction we have two additional false positives. Hence, we never (for none of the recall values) obtain an FDR below 2/3 (at least if we assume that FDR=1-precision here. If not, that may already be my misconception). So if we ask for the best recall value with an FDR below 0.1 (or even 2/3), a recall of 0 would be the right answer. Would you agree? Or did I totally misunderstand your comment? [I just saw that Nathan already answered a few minutes before. Hopefully, my comment is helpful nonetheless.]
> i think it is wrong because by definition a random prediction recall at 5% FDR should have a value of 0.05, and 25% have a value of 0.25....... That's absolutely wrong. For example, for the recall at 5% FDR to be greater than 0%, there must exist a score cutoff such that at least 95% of the labels are correctly predicted. The chance of this happening given random predictions with large label imbalances is very low.
i think it is wrong because by definition a random prediction recall at 5% FDR should have a value of 0.05, and 25% have a value of 0.25....... is that a good support? i am waiting for a big check now, LOL
We are uploading our baselines now. Happy to share the evaluation code as well. We are using standard functions provided by scikit learn in python. But we'll check them again. And of course, if there is an error in our evaluation code we would report that. I don't see why anyone would want to hide an error. Btw, can you provide some support for your statements that there is surely an error in our recall code. Like I said above, this is stable scikit code. So I'd be shocked if that had an error. Nathan is checking it again now. He'll get back to you if we find something wrong with it.
i think it is necessary that the organizing team submit their baseline model to the leaderboard (which i am sure every participant is looking forward to, as it sounds like some smarter implementation is already done). so that you know: ***** 1. auROC is not affected by imbalance. it doesn't change at all no matter how many negatives. it is auprc that will be changed ***** 2. your implementation of recall at fixed FDR is wrong. ***** there is one thing i learned in dream, that these things cannot be clarified by discussion or arguments; because your english is better, so can easily turn black into white... but facts can immediately tell , and teach everyone what is right from wrong. ##### btw, i 'betted' for a big amount of money with someone who fervently supported that your evaluation code must be correct and that submitting all 0s would get a very high auROC, so please be honest when you find out the bug...
absolutely agree. would be nice to see a model that has higher than zero recall at a fixed FDR without changing the evaluation code.
It's pretty easy to get high auROCs. They are quite meaningless at this level of class imbalance. The auPRC and recall at fixed FDR measures are really what need to be aced.
Yes. We have implemented the baselines. We have not had any issues with i/o time. I guess it depends on how you are writing out the results. Avoid using for loops for i/o. Predictions on 1% of the genome or any random sampling are highly misleading. Firstly, the true prediction problem is highly unbalanced and the prediction space is a real world scenario is the whole chromosomes/genome. So artificially balancing or alleviating the highly skewed positive to negative ratio inflates certain performance measures. auROCs are anyway highly misleading in these unbalanced scenarios. Secondly, sampling the genome is not trivial since it's not IID and certain types of regions are much easier to discriminate than others. You get vastly differing performance and models when considering different types of genomic regions. When training you can certainly sample as you like but at test time, we really need to know what the predictions look like across the entire chromosome/genome to account for the heterogeneity of the background and not bias results based on the chosen sample.
Epithemius : Not trivial for us to implement this. Will see how many other folks ask for this. Either way, we won't be able to get this done in the next 2 weeks.
i implemented your baseline 1. the performance is soso, https://www.synapse.org/#!Synapse:syn6131484/wiki/402510. i think it actually beat the state-of-the-art like 0.91-0.92 in this field. hmm..... BUT the I/O takes FOREVER!! i started to run before lunch, and when i am back, it is still writing. did you try implementing it? i think you only need to subsampling 1% of the negative regions and let the participants predict. why do they need to predict the whole genome? i think there is something wrong in your recall calculation.....
I would certainly like to do batch uploading. Uploading in a batch need not force participants to supply predictions for every TF.
We expect participants will tune models for each TF separately. We did not want to force participants to submit all TFs at once. This way you can test the performance of a few submissions at a time. Also your submission files stay reasonably small avoiding uploading issues. If there is significant demand for batch submission we can consider supporting it.
it took less than 2 minutes to return the result, before i finished writing that message. it is indeed a huge improvement on the organizers side to know the existence of and use well-accepted and tested scoring packages! much better than last time. i don't know how to batch submit to all at once, otherwise i would test if all are up, because many good citizens must be curious now but reluctant to waste their quota. i think this option should be provided. clicking like 100 times is enough to scare away many lazy people. yuanfang
looks like you submitted to ARID3A and the scores reflect a truly random submission. https://www.synapse.org/#!Synapse:syn6131484/wiki/402488   so the leaderboards are in fact set up correctly! The scoring does take some time to be updated on Synpase because the update scripts are set on time intervals to update.
i know---- minor-insight (and best luck for competing this one)! i told them i will donate them a lucky coin free of charge when they give me a life-time achievement award in dream. LOL but i really don't get the logic of it, are the participants eventually expected to click on like one hundred times to make all submissions?? that's like a whole lunch time wasted on clicking on synapse website....
lol now that is luck Yuanfang
ok. i found it and submitted. it was an absolutely random one, as i just printed rand(1), and it is better than random... hmm. i guess i just have such good luck in dream. congratulations on setting up the leaderboard successfully!
i will submit one to see if their scoring is really functional, or it is just they think it is functional. since i won't participate as a competitor, i don't think i will need that quota. where is the submission template?
I was also thinking the same thing. I'd expect that the baseline prediction which was discussed in the webinar should be showing up, but I don't see anything at all.
Are the leader/ladderboards really open? I don't see any submissions there. Perhaps everyone is holding back.

Scoring is Live!! page is loading…