I have some questions about the evaluation metrics used in the ranking scheme for the MIS task. Since the implementation of the metrics that you provided returns a performance for each instance in an image, we would like to know how these values ??will be combined to obtain the final metric. That is, for an image, the dice for that image is the mean of the dice scores of each instance? Additionally, is the dice score for a set of images the mean of the individual image dice scores? Thanks in advance,

Created by Cristina González Osorio ci.gonzalez10
Dear kkirtac, thanks a lot for your replies. 1.) No the background will not be used. 2.) Thanks a lot for the hint. We fixed the issue and pushed it into the master. The output of both cases should be ``` {'background': 0.9775280898876404, 'instrument_0': 1.0, 'instrument_1': 0} ``` Kind regards, the ROBUST-MIS organizers
I have to add another issue. I tried to explore the behavior of multiple instance dice calculation which you released in [github](https://github.com/schnobi1990/robust_mis_2019/blob/master/evaluation/dice_calculations.py). I understand that miss of an instance object is not punished, but rather returning an additional instance in the prediction which does not exist in ground truth is punished. Is that behavior correct? I suppose missing an object should also be punished somehow. ``` # example miss mask_gt = np.zeros((10,10)).astype(np.uint8) mask_gt [3:6,3:6]=1 mask_gt[7:9,7:9]=2 mask_pred = np.zeros((10,10)).astype(np.uint8) mask_pred [3:6,3:6]=1 # only covers the first object compute_dice_coefficient_per_instance(mask_gt , mask_pred ) ``` Output is ``` {'background': 0.9775280898876404, 'instrument_0': 1.0} ``` If I reverse the above, ``` # extra hit compute_dice_coefficient_per_instance( mask_pred, mask_gt) ``` I get the output ``` {'background': 0.9775280898876404, 'instrument_0': 1.0, 'instrument_1': 0} ```
For a single frame, could you please clarify: with **the instance DSC** you mentioned, will that include the background score during averaging? Or , you will average out only scores that correspond to instrument outputs `output["instrument_{1}"], output["instrument_{2}"], etc.` ?
Dear ci.gonzalez10, thank you for your question. For each frame, the instance DSC will be aggregated via mean. There are two ways of aggregating the DSC scores over a bunch of image frames as described in the ranking scheme: 1. Compute the significance ranking described in [Maier-Hein et al. 2018] and recently applied in the Medical Segmentation Decathlon [MSD 2018]. 2. Compute the 5% percentile of all m(ai,tj) with ai being Algorithm i and tj test case j. Kind regards, the ROBUST-MIS organizers

Multiple Instance Segmentation evaluation page is loading…