Scoring metric error?

Dear moderators, We found in Round 1 leaderboard that some AUC metric(s) may be calculated erroneously; e.g.: ``` team M2Gen_auc M2Gen_bac M2Gen_mcc M2Gen_f1 M2Gen_iAUC M2Gen_prAUC SecessionComputing 0.8889 0.2000 -0.7006 0.25 0.0816 0.216 piermaro 0.8889 0.6444 0.33730 0.50 0.8357 0.6416 MM Baseline Team-1 0.8889 0.2778 -0.4714 NaN 0.0816 0.216 It is mathematically possible for a predictor with AUC=0.88, BUT with all the others (BAC, MCC, F1, iAUC, prAUC) negative ?? (under optimal cut-off) ``` Though AUC will not be so important in the other rounds, hopefully all the metrics in all rounds should be calculated with no doubt. Thanks again for your great efforts in DREAM challenges. Sincerely, Bruce WQ Fang

Created by Wei-Quan Fang deleapoli
Dear Mike, Fully agree that small sample makes it possible; for example, ``` Example A: In ROC figure, only three cut-offs, corresponding to three points (0, 1), (0.88, 0.875), (1, 0) in coordinate (X,Y), lead to AUC = 0.89 but with BAC = 0.33, provided that a step function is applied for AUC calculation. ``` In my opinion, a possible answer to the question is that this is, though correct, algorithm selection behavior (for AUC calculation) rather than actual behavior; e.g., ``` If a smooth function is applied for AUC calculation in above Example A, the AUC will be close to 0.5, with the same BAC = 0.33. ``` However, calculations of AUCs and other metrics (using what R-packages), as well as weighted iAUC formula, are all pre-determined by organizers. Thus we all should follow these game rules and focus more on challenge of itself. Best regards, Bruce
Dear Bruce, Sorry for the delay in a final response. The answer to your question is that this is actually correct behavior. M2Gen is the smallest dataset.The leaderboard have samples with replacement. So for the 23 samples in this LB round from M2Gen a subset are censored with regards to being called high risk and the remainder have some duplicates from the sampling with replacement. The small N of the unique samples here make this possible. It is not seen in other metrics dues to the multiple time thresholds for iAUC and each team calling HR with different thresholds. We do not expect this behavior in the final round which has more samples and lacks sampling with replacement. Kind Regards, Mike
FYI we are still investigating this. It is difficult to assess a leaderboard result once we have moved to the next round but I hope to have done soon.
Thank you for bringing this to our attention. I will look into this.

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Scoring metric error? page is loading…