I am a participant in the BRATS 2023 challenge and I have a couple of questions regarding the evaluation metrics and approach. I would greatly appreciate if you could provide some clarification on the following points:
Dice metric calculation:
When calculating the Dice score over all classes, does your algorithm include the background class in the calculation? It would be helpful to know the exact formula you use to compute the Dice metric, as it will ensure that I am evaluating my model's performance consistently with your methodology.
Evaluation approach for slice-by-slice predictions:
My model is currently predicting segmentations slice by slice (2D). In this case, should I stack all the predicted slices to form the complete patient volume scan and then calculate the Dice score for each patient, averaging the scores over the total number of patients? Or alternatively, is it acceptable to calculate the Dice score for each slice individually and then average the scores over the total number of slices (number of patients × 155 slices)?
Knowing the preferred evaluation approach will help me align my methodology with your expectations and ensure a fair comparison with other participants.
If possible, I would also appreciate it if you could share the specific algorithm or code snippet you use for calculating the Dice metric. This will help me replicate your evaluation process accurately.
Thank you in advance for your time and assistance. I look forward to your response and clarification on these matters.