Dear @RA2DREAMChallengeParticipants , We are excited to have a final list of the top performers for all subchallenges for the RA2 DREAM Challenge: Automated Scoring of Radiographic Joint Damage. We have now posted the [Final Scoring Leaderboard](https://www.synapse.org/#!Synapse:syn20545111/wiki/597246) and to quickly summarize, the top performing teams that are eligible for the [Challenge Incentives](https://www.synapse.org/#!Synapse:syn20545111/wiki/597241) are as follows: **Subchallenge 1**: Predict overall RA damage from radiographic images of hands and feet 1 [Team Shirin](https://www.synapse.org/#!Synapse:syn21478998/wiki/604432) 2 [Hongyang Li and Yuanfang Guan](synapse.org/#!Synapse:syn21680642/wiki/604549) **Subchallenge 2: **Predict joint space narrowing scores from radiographic images of hands and feet 1 [Hongyang Li and Yuanfang Guan](synapse.org/#!Synapse:syn21680642/wiki/604549) 2 [Gold Therapy](https://www.synapse.org/#!Synapse:syn21499370/wiki/604451) 3 [csabaibio](https://www.synapse.org/#!Synapse:syn21614092/wiki/604453) **Subchallenge 3: **Predict joint erosion scores from radiographic images of hands and feet 1 [Gold Therapy](https://www.synapse.org/#!Synapse:syn21499370/wiki/604451) 2 [csabiabio](https://www.synapse.org/#!Synapse:syn21614092/wiki/604453) 3 [Hongyang Li and Yuanfang Guan](synapse.org/#!Synapse:syn21680642/wiki/604549) We wanted to also supply a bit of information on how the evaluation was completed. Our first objective was to establish the most robust gold standard dataset possible. We all understand that scoring of joints is subjective and as such, we wanted to determine if there were any joints that were potentially misscored, and thus would affect the overall evaluation of your methods. Briefly, we used the final prediction results to flag any joints where the predicted scores were consistently different from the radiographic score. Our team of Rheumatologists evaluated over 400 joints and we updated the gold standard for 289 joints to arrive at the final evaluation dataset. Our second objective was to establish reliable and reproducible scoring results. The established scoring metric was used to evaluate team predictions on the final evaluation dataset. This final score across all radiographs is a point estimate of method performance. That is, scores will change with a new gold standard dataset. The question we aimed to address is, how stable are the final scores. In order to do this, we bootstrapped the final evaluation dataset (60-70% of radiographs resampled over 1000 iterations) to establish a range of performance. As will previous DREAM Challenges, we used the Bayes factor to compare teams. For each iteration, we evaluated if Team A outperformed Team B and the final Bayes factor can be directly interpreted as the fraction of times Team A outperformed Team B. That is, a Bayes factor of 3 means Team A outperformed Team B 750 times out of 1000 iterations. We consider a Bayes factor of 3 to be a significant difference between team performances. Below you can see the results of this analysis. All top teams have a significant difference between rankings, so we can establish that the top ranked team is robustly the top ranked team, same for the second and third ranked teams. ${imageLink?synapseId=syn22316888&align=None&scale=100&responsive=true&altText=Subchallenge 1} ${imageLink?synapseId=syn22316887&align=None&scale=100&responsive=true&altText=Subchallenge 2} ${imageLink?synapseId=syn22316886&align=None&scale=100&responsive=true&altText=Subchallenge 3} **Moving Forward, **we will be entering the community phase of the Challenge, we will bring together the results, perform new analyses and write up the full set of results for the overview manuscript. We will be reaching out with a separate email to begin this process. We thank everyone for their active engagement in the RA2 DREAM Challenge. We are very happy with the results and foresee this challenge being a seminal piece of work in the Radiology field. As always, please post questions here. Kind Regards, RA2 DREAM Challenge Organizers.

Created by James Costello james.costello
Yes, thanks to everyone for the opportunity! It was a really interesting experience. I hope we can improve further in the community phase.
Thanks everybody! @ikedim and I look forward to participating in the community phase.
Hi Balint and all, Yes, absolutely. Here is the docker container we used for scoring: https://www.synapse.org/#!Synapse:syn21074475 Here's are excerpts from the scoring script in the container that are most relevant in answering your question. The calculate_weight function calculates the weights for a given patient based on the true score information. The sum-weighted error calculates a weighted error (for SC1) and the RMSE function calculates RMSE for SC2/SC3. The score_log2 function brings these components together to calculate the weighted error for SC1, and to apply the weights and calculate RMSE for each patient and then adjust by the patient weights for SC2/SC3 values. We also log2 transformed all of the scores for readability. ``` suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(magrittr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(readr)) suppressPackageStartupMessages(library(glue)) calculate_weight <- function(x){ weight <- dplyr::if_else(x == 0, 1, dplyr::if_else(x == 1, 2, dplyr::if_else(x>=2 & x<=3, 2.143547, dplyr::if_else(x>=4 & x<=7, 3.863745, dplyr::if_else(x>=8 & x<=20, 8, dplyr::if_else(x>=21 & x<=55, 16, dplyr::if_else(x>=56 & x<=148, 32, dplyr::if_else(x>=148, 64, 0)))))))) return(weight) } sum_weighted_error <- function(gold, pred, weight){ sum(weight*abs(gold - pred)) } rmse <- function(gold, pred){ sqrt(mean((gold - pred) ** 2)) } score_log2 <- function(gold_path, prediction_path){ pred <- readr::read_csv(prediction_path) %>% dplyr::mutate_at(vars(-Patient_ID), ~ replace(., which(.<0), 0)) gold <- readr::read_csv(gold_path) ##score SC1 gold_sc1 <- gold %>% dplyr::select(Patient_ID, Overall_Tol) %>% dplyr::mutate(weight = calculate_weight(Overall_Tol)) pred_sc1 <- pred %>% dplyr::select(Patient_ID, Overall_Tol) %>% dplyr::rename(Overall_Tol_pred = Overall_Tol) sc1_weighted_sum_error <- dplyr::full_join(pred_sc1, gold_sc1, by = "Patient_ID") %>% dplyr::summarize(result = sum_weighted_error(log2(Overall_Tol+1), log2(Overall_Tol_pred+1), weight)/sum(weight)) %>% purrr::pluck("result") ##does sc2/3 prediction exist? gold_joints <- gold %>% dplyr::mutate(weight = calculate_weight(Overall_Tol)) %>% dplyr::select(-Overall_Tol) %>% tidyr::gather('joint', 'score', -Patient_ID, -weight) pred_joints <- pred %>% dplyr::select(-Overall_Tol) %>% tidyr::gather('joint', 'pred_score', -Patient_ID) sc2_sc3 <- dplyr::full_join(pred_joints, gold_joints, by = c("Patient_ID","joint")) ##if participants set all joint values to 0, we will just return NA for sc2/3 if(!all(pred_joints$pred_score == 0)){ #score sc2 sc2_total_weighted_sum_error <- sc2_sc3 %>% dplyr::filter(grepl('Overall_erosion', joint)) %>% dplyr::summarize(result = sum_weighted_error(log2(score+1), log2(pred_score+1), weight)/sum(weight)) %>% purrr::pluck("result") sc2_joint_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl(".+_E__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") sc2_hand_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl("[RL]H.+_E__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") sc2_foot_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl("[RL]F.+_E__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") ##score SC3 sc3_total_weighted_sum_error <- sc2_sc3 %>% dplyr::filter(grepl('Overall_narrowing', joint)) %>% dplyr::summarize(result = sum_weighted_error(log2(score+1), log2(pred_score+1), weight)/sum(weight)) %>% purrr::pluck("result") sc3_joint_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl(".+_J__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") sc3_hand_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl("[RL]H.+_J__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") sc3_foot_weighted_sum_rmse <- sc2_sc3 %>% dplyr::filter(grepl("[RL]F.+_J__.+", joint)) %>% dplyr::group_by(Patient_ID, weight) %>% dplyr::summarize(patient_rmse = rmse(log2(score+1), log2(pred_score+1))) %>% dplyr::ungroup() %>% dplyr::summarize(result= sum(weight*patient_rmse)/sum(weight)) %>% purrr::pluck("result") }else{ sc2_total_weighted_sum_error <- NA sc2_joint_weighted_sum_rmse <- NA sc2_hand_weighted_sum_rmse <- NA sc2_foot_weighted_sum_rmse <- NA sc3_total_weighted_sum_error <- NA sc3_joint_weighted_sum_rmse <- NA sc3_hand_weighted_sum_rmse <- NA sc3_foot_weighted_sum_rmse <- NA } score <- c( "sc1_weighted_sum_error" = sc1_weighted_sum_error, "sc2_total_weighted_sum_error" = sc2_total_weighted_sum_error, "sc2_joint_weighted_sum_rmse" = sc2_joint_weighted_sum_rmse, "sc2_hand_weighted_sum_rmse" = sc2_hand_weighted_sum_rmse, "sc2_foot_weighted_sum_rmse" = sc2_foot_weighted_sum_rmse, "sc3_total_weighted_sum_error" = sc3_total_weighted_sum_error, "sc3_joint_weighted_sum_rmse" = sc3_joint_weighted_sum_rmse, "sc3_hand_weighted_sum_rmse" = sc3_hand_weighted_sum_rmse, "sc3_foot_weighted_sum_rmse" = sc3_foot_weighted_sum_rmse ) return(score) } ```
I would like to thank the organizers to make this challenge possible especially during these unusual times. While we learned a lot it was fun at the same time to compete. Looking forward to the community phase! Now, that the competition part is over, can you share the metric with us? I mean the weighting scheme for the MAE and MSE? Thank you, Balint
I would like to thank the organizers for this great competition. The problem identified by the organizing committee was very interesting and challenging to tackle. Despite the novelty, and initial difficulties found early on, the organizers were very knowledgeable and responsive to remarks raised by the competitors, including arranging an overall reaasessment of the images by radiologists in the midst of the pandemics- and as a physician, I know how busy doctors were and still are during this period... Overall I am thrilled to have taken part in the design of a solution that may allow to partly automate what used to be a tedious, time consuming and subjective process. The solutions proposed may provide a model to new applications in the field of radiology. I am looking forward to taking part in the community phase of this challenge ! Thanks again for the orginizing committee !

Challenge Top Performers page is loading…