Skip to content

Speech-to-Text is a technology that can convert voice data into text data. This allows computers to understand human language through voice commands. We combine machine learning technology into Speech-to-Text, namely the Gaussian Mixture Model and the Hidden Markov Model to identify sounds in text.

Notifications You must be signed in to change notification settings

daffaarizkyy/STT_GMM-HMM

Repository files navigation

Speech-to-Text using Gaussian Mixture Model & Hidden Markov Model

Description

Logo

Speech-to-Text is a technology that can convert voice data into text data. This allows computers to understand human language through voice commands. We combine machine learning technology into Speech-to-Text, namely the Gaussian Mixture Model and the Hidden Markov Model to identify sounds in text.

Live Demo Now is UNAVAILABLE.

Table of Contents

General Information

Speech-to-Text is a technology that can convert voice data into text data. This allows computers to understand human language through voice commands. We combine machine learning technology into Speech-to-Text, namely the Gaussian Mixture Model and the Hidden Markov Model to identify sounds in text.

Technologies Used

  • Flask==2.2.2
  • hmmlearn==0.2.8
  • ipython==8.10.0
  • librosa==0.9.2
  • numpy==1.23.5
  • PySoundFile==0.9.0.post1
  • python_speech_features==0.6
  • scikit_learn==1.2.1
  • scipy==1.10.0
  • soundfile==0.11.0
  • Werkzeug==2.2.2

Features

  • Easy to Use
  • Able to perform speech recognition and convert to text automatically

Lacks

  • For now only supports English Language & File Format .wav

Screenshots

Example screenshot Example screenshot Example screenshot

Setup

The requirements.txt file should list all Python libraries that needed for this project. This library will be installed using:

pip install -r requirements.txt

Usage

Type on your CMD or Terminal :

  • Clone this Repository
git clone https://github.com/daffaarizkyy/STT_GMM-HMM
  • cd to your directory (on where's you clone this project)

For Example:

cd STT_GMM-HMM
  • Run pip install -r requirements.txt

  • And Run python app.py

  • Open your browser and enter localhost:5000 or http://127.0.0.1:5000/

Project Status

Project is: complete

Room for Improvement

Room for improvement:

  • The Speech Recognition Processing needs to be improved so that the processing is more faster

To do for future development:

  • Added more supported languages and file formats

Acknowledgements

  • This project was inspired by Youtube Closed Captions and Many Films with Subtitle.

Many thanks to:

  • Irvan Kurniawan : Modelling HMM Department of Informatics, Faculty of Computer Sciences, Universitas Sriwijaya, Indonesia

  • Muhammad Daffa Rizky Fatarah : Modelling GMM and UI/UX Designer Department of Informatics, Faculty of Computer Sciences, Universitas Sriwijaya, Indonesia

  • Osvari Arsalan, S.Kom., M.T : Lecturer and Researcher Department of Informatics, Faculty of Computer Sciences, Universitas Sriwijaya, Indonesia

  • Rizki Kurniati, M.T. : Lecturer and Researcher Department of Informatics, Faculty of Computer Sciences, Universitas Sriwijaya, Indonesia

Contact

Created by @Wibu x Nolep - feel free to contact us!

About

Speech-to-Text is a technology that can convert voice data into text data. This allows computers to understand human language through voice commands. We combine machine learning technology into Speech-to-Text, namely the Gaussian Mixture Model and the Hidden Markov Model to identify sounds in text.

Topics

Resources

Stars

Watchers

Forks