Hi @allawayr , I scored. Here is my score:
```
sc1_weighted_sum_error : 3.7804
sc2_total_weighted_sum_error : 3.0289
sc3_total_weighted_sum_error : 3.4289
```
I have no idea what this score means. Is it better if it's 0 or is it better if it's 100? What's a minimum/maximum winning score? Is there a cutoff? I can tell if there's a Leaderboard, by the place order shown for solvers. However, the [Leaderboard](https://www.synapse.org/#!Synapse:syn20545111/wiki/597246) is blank.
When will the Leaderboard be updated to reflect my new score, and how frequently will it be updated?
Thanks,
Lars
Created by Lars Ericson lars.ericson Code is on the [FAQ page](https://www.synapse.org/#!Synapse:syn20545111/wiki/599308). @lars.ericson just following up on the post from above. The baseline model has been posted and reflects a prediction using a random distribution of the training values.
It's called "RA2 Baseline Model". We'll post the code on the wiki. @arielis good catch, I've edited my guess above with an idea for $$SC_1$$. The formula you assume for SC1 is probably not true, because one can participate to SC1 and leave erosion and narrowing scores empty. @arielis, I guess that the individual J and E scores are
$$\(J_i=|pred_{J,i}-gt_{J,i}|, 1 \leq i \leq 44\)$$
$$\(E_i=|pred_{E,i}-gt_{E,i}|, 1 \leq i \leq 42\)$$
Suppose there are $$N$$ test cases and 86 features. I guess there is a weight matrix $$W$$ of dimension $$N \times 86$$ such that
$$1 = \sum_{i=1}^N W_{i,j}, 1 \leq j \leq 86$$
and
$$1 = \sum_{j=1}^{86} W_{i,j}, 1 \leq i \leq N$$
Then
$$SC_3 = \frac{86}{42} \sum_{j=1}^N \sum_{i=1}^{42} W_{i,j} E_i$$
$$SC_2 = \frac{86}{44} \sum_{j=1}^N \sum_{i=1}^{44} W_{i+42,j} J_i$$
and then $$SC_1$$ may be something like
$$SC_1 = \frac{1}{86} \sum_{j=1}^N (\sum_{i=1}^{86} W_{i,j}) |gt_{Overall,i} - pred_{Overall,i}|$$
My guess. Sponsors could spell it out a little bit more than they do in the Assessment section here: https://www.synapse.org/#!Synapse:syn20545111/wiki/597242
That's a nice hypothesis, but I don't think it is the case.
SC1 is supposed to be computed only on the Overall_Tol. This variable has a standard deviation of about 44.
There's no way I am close to 1 away from the expert based rating (or even 3.5, as in the other models) . @arielis I don't know whether they are going with class weights or sample weights. Either way that just affects the averaging, where what is being averaged is abs(true mark - predicted mark). So regardless of which instances have more importance, the meaning of the score is still distance from the true mark. So if you show 0.5 as your score, then, for some weighting of the cases and submarks, you are on average 0.5 away from the true mark. Where by mark I mean the integer category assignment from an expert. That seems clear to me and I don't think we need to think harder than that to read the score. Thanks @lars.ericson !
I thought the weighting was against the features rather than over the patients. Congratulations @arielis you definitely have the best score by far. It is weighted RMSE so they are upweighting the positive cases. So I would read your score as saying that especially on the positive cases, the distance between your average damage or narrowing score and the true score is less than 1. I scored also, apparently very well compared to other submissions so far.
But I have no idea what the numbers mean...
Is there any meaning to being above or under 1.0 ? Hi @allawayr please run the baseline model through the scoring engine so that it posts in the Leaderboard with a name like "baseline", and also remind us whether we will get paid if we do worse than the organizer baseline but better than the other solvers, or if the organizers can take the money off the table by scoring the best model. Hi Lars,
Based on my testing with randomized prediction values, I would expect most participants to be in the 0-5 range for these metrics. A "good" score is a little hard for me to define until we start to see what other predictions come in. We'll also post a baseline model provided by one of the co-organizers soon that will help folks contextualize their scores.
Cheers,
Robert I'm at the top of the Leaderboard, yay!
One caveat: against the training set, my model was predicting all zeros for damage. After running through a giant neural net, of course, but I could replace the giant neural net with
```
lambda x: 0
```
So if a score in the threes is considered good then the scoring metric chosen might need to be rethunk. That's because it's not picking up much information about false negatives. Leaderboard is live - sorry for any inconvenience.
Cheers,
Robert
Hi Lars,
The leaderboard should update more or less immediately once the container has run and been scored. Looking into why it's not showing up now.
Regarding the score - it's a weighted sum of errors across the patients (SC1) or the sum of the weighted RMSEs (SC2/3):
>For Subchallenge 1, we will calculate the average of the weighted absolute error over all patients with the SvH score being the target score to be predicted. For Subchallenge 2 and 3, we will calculate the average of the weighted RMSE per patient based on individual joint space narrowing scores or individual joint erosion scores
So @decentmakeover is correct - lower is better. A score of 0 would be a perfect prediction for all patients.
also my guess would be lower the better. @allawayr can clarify maybe @lars.ericson Congrats. Waiting for the leader board to update and reflect your score.