Thank you all for participating in this year's BraTS Challenge. This year we are introducing two new performance metrics called lesion-wise dice score and lesion-wise Hausdorff distance-95 (HD95). These were developed to evaluate segmentation performance at a lesion level rather than at the whole study level. By evaluating segmentation performance at the lesion level we can understand how well models detect and segment multiple individual lesions within a single patient. Traditional performance metrics used in prior BraTS are biased for large lesions. In clinical practice detecting distinct small lesions is just as important as large lesions.
The code used for performance metrics is available here:
https://github.com/rachitsaluja/brats_val_2023
Here is an outline of how we perform this analysis and compute the final ranking -
1. First, we isolate the Lesion Tissue Sub-regions into WT (label 1,2,3); TC (label 1 and 3) and ET (label 3).
2. We perform a dilation on the Ground Truth (GT) labels (for WT; TC and ET) to understand the extent of the lesion. This is mainly done so that when we do a connected component analysis; we don't classify small lesions near an "actual" lesion as a new one. An important thing to note is that the GT labels don't change in the process.
3. We perform connected component analysis on the Prediction label and compare it component by component to the GT label.
4. We calculate dice scores and HD95 scores for each lesion (or component) individually and we penalize all the False Positives and the False Negatives with a 0 score for dice and 374 for HD95, we take the mean for the particular CaseID.
5. Each challenge leader has set a volumetric threshold, below which participants' models won't be evaluated for those "small/false" lesions.
GLI, SSA, PEDS: 3x3x3mm dilation and minimum 50 voxels
MEN: 1x1x1mm dilation and minimum 50 voxels
MET: 1x1x1mm dilation and minimum 2 voxels
6. Final Ranking Method: The final ranking will be based on lesion-wise dice and lesion-wise Hausdorff distance-95 (HD95) scores for WT, TC and ET. Each team will be ranked for N subjects, for 3 regions, and for 2 metrics, which resulted in N*3*2 individual rankings. The final ranking score (we call it as BraTS score) for each team will then be calculated by firstly averaging across all these individual rankings for each patient (i.e., Cumulative Rank), and then averaging these cumulative ranks across all patients for each participating team.
Let us know if you have any questions.
-- Rachit Saluja, Jeff Rudie and Ujjwal Baid