You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a task, I want to use the coordinates of the center point of the detection box of the target detection as the feature point input, so as to match the next frame image to achieve the tracking effect. How are the feature points of XFeat learned?Methods such as COTR do this, but are inefficient.
The text was updated successfully, but these errors were encountered:
Hello @LXXIANG12, thank you for your interest in XFeat.
XFeat keypoints are distilled from ALIKE, but the keypoint network is exceptionally small.
If I understand correctly, you wish to extract descriptors from a desired input position. This is entirely possible, as XFeat's coarse feature map is dense, allowing interpolation of descriptors at any desired location. This can be achieved using the provided sparse interpolator.
In the next frame, you can focus on the vicinity of the last coordinate from the previous frame. This can be done efficiently by cropping the feature map, for example, into a 5x5xdim patch centered at the coordinate, followed by a fast dot product to extract a heatmap.
For the second question, yes we provided an example by training with fully synthetic data
I have a task, I want to use the coordinates of the center point of the detection box of the target detection as the feature point input, so as to match the next frame image to achieve the tracking effect. How are the feature points of XFeat learned?Methods such as COTR do this, but are inefficient.
The text was updated successfully, but these errors were encountered: