I have some concerns about using Pearson correlation to evaluate the prediction of leaderboard data in sub challenge 3.
Based on the LOO prediction that I got, the error (mse) for samples in Rhinovirus Duke and DEE5 H3N2 is smaller than that of the remaining studies, but the correlation r^2 for LOO prediction in these two studies is really bad (-0.4 for phase 3), while the correlation for prediction of the remaining studies is pretty good (as large as 0.7 for phase 3).
I also read some literatures on machine learning. It looks like the common measure for continuous outcome prediction performance is mse or rmse. Can we change or add some measure of prediction performance to sub challenge 3 in leaderboard?
Q Li
Created by Qian Li qli4 We are actively working on bringing in independent test data, and will let you know if/when those data are available for use. Thank you, Solly, for the answer. Yes, it makes more sense to use correlation in this case.
I have another question. Based on my results, the correlation for predictions in Rhinovirus Duke and DEE5 H3N2 is negative (a very bad result), but it is good (around 0.5) for the predictions of the remaining studies. The test data samples in leaderboard are from 3 studies only, two of which happen to be Rhinovirus Duke and DEE5 H3N2. So the prediction for leaderboard might not be good due to the restriction of studies, even though we found the best model for most of the samples.
I understand that it is impossible to add more samples to leaderboard now. But is it possible to have more studies covered by the future new test dataset?
Thanks!
Qian Qian-
To me this illustrates a problem with the MSE (or perhaps your interpretation of it) rather than the correlation. Say for example, someone gives you a test to deploy in the clinic, but when you check, the predicted values are negatively correlated with the true values. Would you be comfortable with that predictor, even if it had low MSE? Because the MSE is not normalized by the variance of the true values, its possible you're seeing lower MSE strictly based on a lower variance in this cohort. Moreover, because MSE is a tradeoff between the variance of the predictor and the bias, in some cases the MSE can be manipulated such that the variance is low and the bias is high, but the overall MSE is smaller than for less biased predictors. We are not interested in these cases, because I would argue that those are not good predictors.
Correlation on the other hand tells us if generally we're predicting higher values when the true value is higher, and lower values when the true value is lower, and we don't really care about the center and scale, since we can manipulate those.
I hope that helps,
Solly