Hello everyone.
I'm not sure i fully understand the scoring metrics based on recall at 10 and 50% FDR.
I assume the procedure is following:
From prediction scores P select top S such as [number of true positives in S]/|S| = X%
then, count recall(S).
The problem is there exists multiple such sets S. Some of them may be very small.
Which one is taken?
Created by Lando Andrey Dronte that's not what i meant...but you are right, it doesn't matter, i have been such a fool for trying to make things correct. and after all what i went through i still haven't learned that actually no body cares.
but now.... i need to clarify that in all challenges i submitted using my single ID with real name/lab name way-way with in the challenge allowance..... and a huge part of the organizers work is to spend months checking the compliance of my submissions...
Hi Yuanfang,
Thank you for the feedback! We've discussed this loophole internally, but because the ranking will be determined by the best submission from each team then I don't believe that the loophole you describe applies. Of course, if participants were allowed to create multiple accounts then the attack you describe would be feasible, but my understanding is that this is against the challenge rules. Please feel free to contact me directly if I've misunderstood.
Best, Nathan i have a question on the p-value -based ranking.
i think it can be problematic, if the ranking is based on other teams' performance for each (sth): when two submissions are really similar, the one who are similar to a third good submission would be ranked higher.
i raised the same question for the 2014 AD challenge, which was also derived by relative ranks of multiple metrics. until the end, no body understood what i said, or bought my argument, even when i tried to prove it by submitting the exact same thing to the leaderboard multiple times and made it rank number one...... so i decided to take advantage of this feature, and eventually ranked first by submitting two extremely similar version (the one who ranked second didn't even realize that until he sent me a congratulation email this year and then out of scientific responsibility i told him the trick).
just something for the organizing team to consider, i think it is more appropriate to take the average of some type of metrics.
Hi Lando,
Roughly, we choose S to maximize the sum of the predicted probabilities. If multiple such sets exist, we choose from among the randomly.
Here's the exact code that generates the estimate:
```
from sklearn.metrics import precision_recall_curve
def recall_at_fdr(y_true, y_score, fdr_cutoff=0.05):
precision, recall, thresholds = precision_recall_curve(y_true, y_score)
fdr = 1- precision
cutoff_index = next(i for i, x in enumerate(fdr) if x <= fdr_cutoff)
return recall[cutoff_index]
```
Best, Nathan