Thanks to the official for providing the calculation of Average 2D Endpoint Error in the code, but when we reproduced Control and CSRT, we found that the obtained value was much larger than the value in Figure 13 in the article. The maximum value in the figure is almost 40, and the maximum result we reproduced is 700, and 80% of the Endpoint Error Control indicators exceed the value of 40, and we found that there is not much correlation with the length of the video, so we did not modify the code Calculate the pointlossunidirectional function. If you want to know where we went wrong, please feel free to contact me via email. This is a function that solves Endpoint Error in the source code? ``` def pointlossunidirectional(ptsa, ptsb): """point loss between ptsa, and nearest in ptsb N 2""" if type(ptsa) == torch.Tensor: ptsa = ptsa.cpu().numpy() if type(ptsb) == torch.Tensor: ptsb = ptsb.cpu().numpy() num_point = ptsa.shape[0] tree2 = KDTree(ptsb, leafsize=10) distances2, indices = tree2.query(ptsa) # find closest to ptsa av_dist2 = np.mean(distances2) # average euclidean distance pointdists = distances2 if ptsa.shape[0] == 1: indices = [indices] for idx in indices: if idx == ptsb.shape[0]: breakpoint() displacements = [ptsb[ind] - ptsa[i] for i, ind in enumerate(indices)] return { "averagedistance": av_dist2, "distancelist": pointdists.tolist(), "displacements": displacements, } ```

Created by rulin zhou SZU_zrl
Hi! I posted a reply under the github [issue](https://github.com/athaddius/STIRMetrics/issues/1) where you posed the question, but I have copied it here for reference: "Thanks for checking into this! The paper that relates to this data uses an unfiltered version of the dataset, and only reports error for clips under 10 seconds in length. This particular codebase (STIRMetrics) is written for the STIR challenge https://stir-challenge.github.io/ which we are hosting to enable easier comparison of methods. The figure you describe reports averages results up to a temporal length, but does not report maximums or individual values, which is likely the reason for the discrepancy. I'd recommend focusing on the comparative metrics between the methods you try on the validation data we provide at STIR for challenge participation and method evaluation. Essentially, if the results look good in the clicktracks.py application, you are moving in the right direction."

About the calculation of Average 2D Endpoint Error in the STIR article page is loading…