Hi @trberg , Could you provide more details for F-beta evaluation? What are the beta value and the average option? For example, in sklearn, there are multiple ways to compute F-beta. ``` >>> from sklearn.metrics import fbeta_score >>> y_true = [0, 1, 2, 0, 1, 2] >>> y_pred = [0, 2, 1, 0, 0, 1] >>> fbeta_score(y_true, y_pred, average='macro', beta=0.5) 0.23... >>> fbeta_score(y_true, y_pred, average='micro', beta=0.5) 0.33... >>> fbeta_score(y_true, y_pred, average='weighted', beta=0.5) 0.23... >>> fbeta_score(y_true, y_pred, average=None, beta=0.5) array([0.71..., 0. , 0. ]) ``` Another question: we are asked to provide a prediction file with `outcome likehood`, but to evaluate F-beta, we need a threshold to convert this probability into a binary label (0/1). I understand that we could obtain the threshold from either AUROC or AUPR. What threshold would you use for the conversion during the evaluation? The threshold from AUROC or AUPR?

Created by Junjie Hu junjiehu
Hi @trberg, Thanks for the reply! I wonder if it's possible to release just the evaluation script (or function) to us. Because we would like to make sure that we optimize the model consistently with the official evaluation. It could be a function that takes two arguments (i.e., **the prediction likelihood dataframe and the ground-truth outcome labels**) and produces a series of evaluation scores. This is **really important and helpful** for us, as some of our methods may use reward-based optimization (e.g., minimum Bayes risk). ``` def evaluation(outcome_likelihood, outcome_truth): F-beta = ..... print('F-beta', F-beta) # and more metrics ``` Thanks, Junjie
Hi @junjiehu, We're not putting any weight in F-beta for this challenge. However, in the case of any threshold dependent metric we do look at, I'll be calculating your score along a range of thresholds to find the max score possible and will use that in our evaluation. Hope this helps! Tim

Evaluate metrics: F-beta page is loading…