Dear challenge organizers, First of all, thank you very much for the great efforts to organize this challenge. We would like to ask a question regarding the final round evaluation. I guess most participants are using public drug sensitivity datasets for training their models. We were wondering if there can be a case where testing drugs might be included in the training set (if the testing drugs are well known). In that case, evaluation scores can be quite affected by whether the test drugs were included in the training set. Would this issue be considered in the final round (such as evaluating predictions after excluding the test drugs in the training phase)? Or testing drugs cannot be included in the public datasets? Any comments will be deeply appreciated. Thank you very much. Best regards, Sungjoon Park

Created by Sungjoon Park Sungjoon
@allawayr Thank you for your response! Yes, the concern was that whether the evaluation scores would be biased to the number of training drugs included in the test set. Thanks, Sungjoon
Hi @Sungjoon My apologies for the delay! If I understand your concern correctly, it's that people may get lucky and use public training data for the same compounds as the blinded data? Is this correct? To clarify, the test drugs are _definitely_ used in public datasets. Specifically, this is where the sensitivity information is derived from to test the final predictions: >## **Assessment** >Submissions will be scored using two metrics: > >1) **AUROC**: We have used a mixture-modeling approach to classify "sensitive" and "non-sensitive" cell lines for each compound with data obtained from the Cancer Therapeutics Response Portal (CTRP). The submitted prediction values will be compared to the classification of each cell line, and we will calculate the Area Under the Receiver Operating Characteristics curve (AUROC) for each compound. For more on the AUROC, [read here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic). The total score for this metric will be the average AUROC across all 15 leaderboard or final round compounds. >2) **Spearman correlation**: The submitted prediction values will be inversely correlated to the CTRP Area Under the dose-response Curve (AUC) values for all cell lines. A higher Spearman correlation indicates a more accurate correlation of increasing confidence values to decreasing AUC values. This correlation will be calculated for each compound, and then averaged across all compounds within the leaderboard or final phase to generate an average Spearman correlation value. In other words, it is likely that everyone is using training expression data that corresponds to the test compounds - this is part of the challenge. Let me know if I have misunderstood your question. cc @efd2115 and @SzalaiB for their thoughts too.

Final round evaluation page is loading…