[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

dandansamax · 2024-08-29T14:39:22Z

Required prerequisites

I have searched the Issue Tracker that this hasn't already been reported. (+1 or comment there if it has.)

Motivation

The current object detection visual prompt (GroundingDino) only finds the icon box. We want to get semantic descriptions for each icon to help agent understand UI.

Solution

The first step can be using VLLM to generate the description after passing through the object detection.

Additional context

No response

dandansamax added enhancement New feature or request visual prompt labels Aug 29, 2024

dandansamax self-assigned this Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

dandansamax commented Aug 29, 2024

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

[Feature Request] Add icon descriptions in visual prompt of interactive elements detection #24

Comments

dandansamax commented Aug 29, 2024

Required prerequisites

Motivation

Solution

Additional context