This project is an attempt in this direction to build a novel smart glass which has the ability to extract and recognize text captured from an image and convert it to speech. OpenCV is an open source Computer Vision Library in which we can use many image related operations, in this project it is used to capture and modify real-time text from external camera. The text captured and modified is then processed by Tesseract-OCR and Efficient and Accurate Scene Text Detector (EAST) based on Deep Learning techniques to give output as coputer generated text. For actual Translation of text in many different languages is done by Natural Language Processing (NLP) a part of ML in which data sets of different language translations are given to train the model. After all this IoT part kicks in (It consists of a Raspberry Pi Zero microcontroller which processes the image captured from a Pi camera superimposed on the glasses of the person) as the actual hardware takes care of all the operations and algorithms to perform in small computer and can be wear as regular glasses. The recognized text is further processed by Google's Text to Speech (GTTS) API to convert to an audible signal for the user.
Input of the image captured by the camera (here Pi Cam).
Removing noise from the image using OpenCV and converting it to Text using pyTesseract.
Translating the text using NLP trained dataset to other languages (here Spanish).
Conversion of translated text using GTTS to speech.