I would like to write a paper based on the SurgVisDom dataset, and I have some questions.
Could you tell me how to calculate unweighted f1-score, global f1-score, and balanced accuracy?
Could you provide the quantitative results for every team in the two categories?