GitHub - week9-Benkart/Speech-to-text-data-collection-with-Kafka-Airflow-and-Spark: This project is to produce a tool that can be deployed to automate sending and obtaining audio files and text from and into a cloud storage, apply transformation in a distributed manner, and load it into a storage system in a suitable format to train a speech-to-text model.

Speech-to-text data collection with Kafka, Airflow, and Spark

Speech-to-text-data-collection-with-Kafka-Airflow-and-Spark This project is to produce a tool that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-t0-text model.
Explore more

dataset on Github · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgements

About The Project

The purpose of this week’s challenge is to build a data engineering pipeline that allows recording millions of Amharic and Swahili speakers reading digital texts in app and web platforms. There are a number of large text corpuses we will use, but for the purpose of testing the backend development, you can use the recently released Amharic news text classification dataset with baseline performance dataset:

Here's What this module can do:

List goes here
and here
...

A list of commonly used resources that we find helpful are listed in the acknowledgements.

Built With

Resoures that are used in this project are :

Boto3
python kafka

Getting Started

You can get a local copy up and running follow these simple example steps.

Installation

Clone the repo

git clone https://github.com/week9-Benkart/Speech-to-text-data-collection-with-Kafka-Airflow-and-Spark.git

Install the setup.py

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contributers

Dibora (team lead)
Toyin (deputy team lead)
Elias Andualem
Abreham Gessesse
Euel Fantaye
Yosef Engdawork
Michael Darko Ahwireng
Mubarak Sani

Project Link: https://github.com/week9-Benkart/Speech-to-text-data-collection-with-Kafka-Airflow-and-Spark.git

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
.vscode		.vscode
logs		logs
notebooks		notebooks
scripts		scripts
test		test
.DS_Store		.DS_Store
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-text data collection with Kafka, Airflow, and Spark

About The Project

Built With

Getting Started

Installation

Roadmap

Contributing

License

Contributers

Acknowledgements

About

Releases

Packages

Contributors 7

Languages

License

week9-Benkart/Speech-to-text-data-collection-with-Kafka-Airflow-and-Spark

Folders and files

Latest commit

History

Repository files navigation

Speech-to-text data collection with Kafka, Airflow, and Spark

About The Project

Built With

Getting Started

Installation

Roadmap

Contributing

License

Contributers

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages