I understand that for subchallenges 2 and 3, we are required to provide scores for erosion and narrowing at each joint, as well as the overall scores,
and that the winners would be determined by the RMSE.
I understand that the mean square error would include also the square error of the overall erosion (resp. narrowing).
As the overall scores are several orders of magnitude greater than the individual joint scores, and it is taken to the square, the error in the overall score is likely to predominate over individual errors in the joint scores.
So this may mean, that it would not matter so much to give correct individual joint scores, and that the solution which would succeed to give the most accurate overall score would most probably win.
Am I correct in this assumption?
Created by Ariel Yehuda Israel arielis @lars.ericson
I tend to think that what the sponsors would really like is what they stated in the wiki:
Subchallenge 1 is designed to ensure that the overall RA damage is correctly evaluated by the solution.
Subchallenges 2 and 3 are designed to ensure that the extent of narrowing and erosion are correctly evaluated by the solution.
By excluding overall narrowing, and overall erosion from these 2 subchallenges, nothing ensures now that the TOTAL narrowing and erosion are correctly evaluated.
In fact, if these are excluded, one can submit a solution with garbage in the Overall_erosion and Overall_narrowing columns and get exactly the same score in each of the challenges...
We still have until March 15th or so to submit solutions. The sponsors know their own needs. Rather than tell the sponsor to want something other than what they want, we should help the sponsors clarify the goal and then re-work solutions to support the clarified goal. @dongmeisun,
Actually I went after the instructions in the wiki when I designed my solution and made my first submission, and it would probably perform poorly when the overall is excluded.
Can you keep it the way it is stated in the wiki ? The overall narrowing and erosion will not be included in the rmse calculation in subchallenge 2 and 3. Thanks for reminding me. I will ask Robert to amend the wiki page. @dongmeisun ,
Do you have the official answer for this ?
Are the overall narrowing (resp. erosion) included in the rmse calculation in subchallenge 2 (resp. subchallenge 3)? @arielis, Thanks for your patient. Sorry for the delay due to the holidays, and I will get back to you asap. @dongmeisun,
I would be happy to get confirmation from your double-check, because the wiki seems to say otherwise (in the Assessment section)
<<
For each leaderboard or final scoring test dataset, the known SvH scores for Subchallenge 1, **each joint space narrowing and overall narrowing scores for Subchallenge 2, and each joint erosion and overall erosion scores for Subchallenge 3 will be used as the ?true? scores and will form the basis for judging the results submitted to the Challenge. Team's will be evaluated on how close they can predict the true scores, thus we will use the Root Mean Square Error (RMSE) as the primary metric and Pearson correlation coefficient as the secondary metric (to break ties).**
>> @arielis We will only evaluate the individuals but not the totals in sub 2 and 3. I will have a double check and let you know if I'm wrong here.
Thanks for your questions.
@dongmeisun
My question is about the scoring metric of subchallenges 2 and 3 (not subchallenge 1).
When calculating the Mean Square Error, is the score for overall erosion (resp. narrowing) **given the same weight **as the individual joint erosion (resp. narrowing) scores.
If this is the case, since the error is taken to the square, and the overall score is much bigger than the individual joint scores, the error on the overall is expected to predominate over individual joint errors.
For the purpose of the challenge, they are equally important. That's why we have subchallenges. Speaking for myself (I'm a solver, not an organizer, in this challenge): If localization is not being scored, then you could do a table lookup on average rates of erosion conditional on a single joint having a certain level of erosion. So you could win based on analysing a single joint, on average. You might do poorly on a case where all the other joints are missing (a 1-fingered hand), but in general the lookup/single digit focus strategy could win.
The ground truth supplied for scoring gives you the data needed for this conditional probability estimate.