some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

shuiyued · 2023-05-10T06:43:17Z

I would like to seek your advice on some issues related to the grid-cam image visualization method used in the paper "Divert More Attention to Vision-Language Tracking". This approach of annotating sentences on images and generating heatmaps is excellent! I have been struggling to understand how it works and hope to learn more from you.
In the paper, only the use of grid-cam is mentioned, but I only found the image input in the open-source library, and our model's input is usually more than just one image, which causes a mismatch. Moreover, the target_layers are not provided for the template layers such as ResNet. Also, the targets = [ClassifierOutputTarget(281)] classification seems to be incompatible with tracking.
If you could spare some time to reply to my email and guide me on how to proceed or provide relevant code, I would be extremely grateful. Thank you for taking the time to read my email, and I look forward to hearing from you.
Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

shuiyued commented May 10, 2023

some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

some questions about grid-cam of Divert More Attention to Vision-Language Tracking #87

Comments

shuiyued commented May 10, 2023