Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

Open
shuiyued opened this issue May 10, 2023 · 0 comments

Comments

@shuiyued
Copy link

I would like to seek your advice on some issues related to the grid-cam image visualization method used in the paper "Divert More Attention to Vision-Language Tracking". This approach of annotating sentences on images and generating heatmaps is excellent! I have been struggling to understand how it works and hope to learn more from you.
In the paper, only the use of grid-cam is mentioned, but I only found the image input in the open-source library, and our model's input is usually more than just one image, which causes a mismatch. Moreover, the target_layers are not provided for the template layers such as ResNet. Also, the targets = [ClassifierOutputTarget(281)] classification seems to be incompatible with tracking.
If you could spare some time to reply to my email and guide me on how to proceed or provide relevant code, I would be extremely grateful. Thank you for taking the time to read my email, and I look forward to hearing from you.
Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant