Vision Assistant Services

Description

Vision Assistant is an innovative and accessible application designed to empower visually impaired individuals by enabling them to ask questions and receive answers about the content of images. Leveraging vision language models, Vision Assistant breaks down barriers to information and enhances independence for users with visual impairments.

Backend Key Features

Image Understanding: Vision Assistant employs advanced computer vision technology to analyze images and extract meaningful information from them.
Natural Language Interaction: Users can interact with the application using natural language queries, allowing them to ask questions about the content of the scenes/images in both speech form and text form.
Detailed Response: Vision Assistant provides detailed answers, describing the elements, objects, and more within the scenes.
Feedback Collecting: The application receives users' feedback, retrains the model, and improves user experience.

System Design

The backend is built following the Aggregator microservices design pattern. The Aggregator Service works as an entry point, invokes functional services, aggregates outputs, and responds to users.

Services

Documents for APIs development and deployment:

Interservice Communications

For the sake of simplicity, we select the Synchronous Interservice Communication, in which a service calls an API that another service exposes, using HTTP protocol.

Deployment

The Vision Assistant backend is deployed on the AWS Cloud infrastructure, leveraging a combination of services to ensure scalability, reliability, and accessibility. The microservices are dockerized and orchestrated using AWS ECS with EC2 and Fargate launch types. Below is an overview of the deployment solution and the AWS services involved:

Future Work

Our future endeavors will focus on further enhancing the VQA service, with particular emphasis on optimizing the Vision-language model
User-Centric Retraining: We are dedicated to building a training platform that harnesses users' feedback effectively, allowing us to iteratively fine-tune the model based on real-world usage scenarios and user-generated questions.

Contributing

As the project is done by multiple services, please follow the guide corresponding to the service you want to contribute to.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github		.github
apis		apis
docs		docs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Assistant Services

Description

Backend Key Features

System Design

Services

Interservice Communications

Deployment

Future Work

Contributing

About

Contributors 4

Languages

zero-nnkn/vision-assistant-services

Folders and files

Latest commit

History

Repository files navigation

Vision Assistant Services

Description

Backend Key Features

System Design

Services

Interservice Communications

Deployment

Future Work

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Contributors 4

Languages