Hi, I am from team uth-uhel. We have submitted two docker during leaderboard phase. The performance is arount 0.2 and -0.04. But after download the leaderboard data and do the testing locally, the performance is around 0.65 at subchallenge. We are thinking if it's because other reason rather than our method. Could you help us figure out? Thanks

Created by QIAN QIAN qqian
Thanks!
Hi Qian, One implementation of the right-censored concordance index is by scikit-survival - try using the function [concordance_index_censored](https://scikit-survival.readthedocs.io/en/latest/generated/sksurv.metrics.concordance_index_censored.html). Eg: ```python import pandas from sksurv.metrics import concordance_index_censored predictions = ( pandas.read_csv('leaderboard_response.csv') .set_index('lab_id') .join( pandas.read_csv('my_predictions.csv') .set_index('lab_id') )) cindex = concordance_index_censored( predictions.vitalStatus == 'Dead', predictions.overallSurvival, -predictions.survival)[0] print(cindex) ``` Best, Jacob
Hi Jacoberts, Could you also show us the test code for subchallenge 2? The metric description is also a little confusing. So we could test it correctly and reasonablly locally. Thanks! Best, Qian
Thanks! I will discuss with our team member and try evaluation this way.
The first thing I would do is double check your local scoring code. A score of 0.65 seems extremely high. Remember, for instance, that we score each drug independently, then average those to get the final score. For instance, here is some of the code I used for scoring my submissions locally: ```python import pandas indices = ['lab_id', 'inhibitor'] predictions = ( pandas.read_csv('my_predictions.csv') .set_index(indices) .join( pandas.read_csv('leaderboard_aucs.csv').set_index(indices), lsuffix='_prediction', rsuffix='_groundtruth')) predictions['inhibitor'] = predictions.index.get_level_values('inhibitor') inhibitors = predictions.inhibitor.unique().tolist() spearmans = [] for inhibitor in inhibitors: subset = predictions[predictions.inhibitor == inhibitor] if subset.auc_prediction.var() < 1e-10: spear = 0 else: spear = subset.corr(method='spearman').auc_prediction.auc_groundtruth spearmans.append(spear) spearmans = pandas.DataFrame({ 'inhibitor': inhibitors, 'spearman': spearmans }) print('Mean spearman: %0.3f' % spearmans.spearman.mean()) ```
Thanks for this quick responce. Yes we have used the leaderboard data for testing only, not for training and validation. We use the training data to train and validate only. And we have submit each docker twice or more, and get similar score around 0.2
Hi Qian, Just to double check - did you use the leaderboard data to train before evaluating locally? And if you submit this model again, do you get the same score? Best, Jacob

Low score in leaderboard phase page is loading…