Skip to content

NavAIGuide: a is an extensible multi-modal agentic framework to achieve plans and user queries tapping into the mobile and desktop ecosystem of available apps.

License

Notifications You must be signed in to change notification settings

soffy/NavAIGuide-TS

 
 

Repository files navigation

NavAIGuide

TypeScript Node 20 LTS MIT License

NavAIGuide Screenshot

🤔 What is NavAIGuide?

Welcome to NavAIGuide (/næv eɪ aɪ ɡaɪd/), an extensible multi-modal agentic framework designed to fulfill plans and user queries by tapping into the mobile and desktop ecosystem of apps available. Here's how NavAIGuide stands out:

  • Visual Task Detection: Compatible with GPT-4V and many more vision models, NavAIGuide excels at identifying the next steps directly from page screenshots.

  • Advanced Code Selectors: Recognizing that visual elements and their positions sometimes fall short, NavAIGuide employs grounding techniques for both XML and HTML, allowing for precise matching with the most relevant selectors, tailored to the specific action required.

  • Action-Oriented Execution: At its core, NavAIGuide features an action-based framework using a JSON schema and reproducible outputs.

  • Resilient Error Handling: Understanding that errors are part of AI Agents, NavAIGuide features a built-in retry mechanism with exponential backoff, adeptly navigating through transient failures to ensure the Agent can move forward.

NavAIGuide Agents extend the core toolkit with advanced automation solutions:

  • Preview of Appium iOS Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
  • Preview of Appium Android Agents (Coming soon): Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
  • Playwright-based Web Agents (Coming soon): Learn how to build Web AI Agent Companions.

💻 Getting Started

You can choose to either clone the repository or use npm, yarn, or pnpm to install NavAIGuide.

🚀 Challenges and Focus

Project NavAIGuide continues to face challenges in long-horizon planning and code inference accuracy. The current focus is on enhancing the stability of NavAIGuide agents.

🤓 Contributing

We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.

🛂 License

NavAIGuide is licensed under the MIT License.

🚑 Support

For support, questions, or feature requests, open an issue in the GitHub repository.

About

NavAIGuide: a is an extensible multi-modal agentic framework to achieve plans and user queries tapping into the mobile and desktop ecosystem of available apps.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 64.9%
  • HTML 31.9%
  • JavaScript 2.0%
  • Shell 1.2%