Skip to content

πŸ‘β€πŸ—¨ Vision Assistant (Backend): Smart Assistant for Visually Impaired People

Notifications You must be signed in to change notification settings

zero-nnkn/vision-assistant-services

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

93 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Vision Assistant Services

Description

Vision Assistant is an innovative and accessible application designed to empower visually impaired individuals by enabling them to ask questions and receive answers about the content of images. Leveraging vision language models, Vision Assistant breaks down barriers to information and enhances independence for users with visual impairments.

Backend Key Features

  • Image Understanding: Vision Assistant employs advanced computer vision technology to analyze images and extract meaningful information from them.
  • Natural Language Interaction: Users can interact with the application using natural language queries, allowing them to ask questions about the content of the scenes/images in both speech form and text form.
  • Detailed Response: Vision Assistant provides detailed answers, describing the elements, objects, and more within the scenes.
  • Feedback Collecting: The application receives users' feedback, retrains the model, and improves user experience.

System Design

The backend is built following the Aggregator microservices design pattern. The Aggregator Service works as an entry point, invokes functional services, aggregates outputs, and responds to users.

core

Services

Documents for APIs development and deployment:

Interservice Communications

For the sake of simplicity, we select the Synchronous Interservice Communication, in which a service calls an API that another service exposes, using HTTP protocol. sequence-diagram

Deployment

The Vision Assistant backend is deployed on the AWS Cloud infrastructure, leveraging a combination of services to ensure scalability, reliability, and accessibility. The microservices are dockerized and orchestrated using AWS ECS with EC2 and Fargate launch types. Below is an overview of the deployment solution and the AWS services involved:

deployment

Future Work

  • Our future endeavors will focus on further enhancing the VQA service, with particular emphasis on optimizing the Vision-language model
  • User-Centric Retraining: We are dedicated to building a training platform that harnesses users' feedback effectively, allowing us to iteratively fine-tune the model based on real-world usage scenarios and user-generated questions.

future-work

Contributing

As the project is done by multiple services, please follow the guide corresponding to the service you want to contribute to.

About

πŸ‘β€πŸ—¨ Vision Assistant (Backend): Smart Assistant for Visually Impaired People

Topics

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •