Transcribes audio files and stores them as embeddings for fast similarity search.
https://github.com/openai/whisper https://www.openslr.org/51/
Audio-Sifter aims to be a fast service for persisting and querying transcribed audio files. It offers semantic search capabilities using sentence embeddings. Under the hood it leverages redis for caching metadata, Cohere for generating sentence embeddings and faiss to provide performant searches over millions of sentence vectors.
-
Install Redis: https://redis.io/docs/getting-started/installation/
-
Install other requirements
pip install -r requirements.txt
- https://dashboard.cohere.ai/welcome/register
- Place API-Key between quotation marks in
app/config.py
andtests/config.py
-
Start redis server
-
Start Audio-Sifter
Open a terminal, navigate to the root directory of your semquery clone. Then:
cd app
uvicorn main:app
Your app should now be running at 127.0.0.1:8000. Open up a browser and type in http://127.0.0.1:8000/docs to verify that semquery is running properly.
- Test the application with
transcribe_audio_and_add_to_index.ipynb
The project structure roughly follows the guidelines at https://fastapi.tiangolo.com/.
All routes are located in app/main.py
.
Models are defined in app/models.py
.
Indexing and search functionality is implemented in app/index.py
.
You can explore the project using the interactive, auto-generated api docs at http://127.0.0.1:8000/docs.
The following features might be added in future releases:
- add tests
- validate index schemas with
jsonschema
- add durable, on-disk persistence for multiple indices
- add batch requests
- add options for different index-types to leverage faiss
- add a CRUD-module for redis