Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Speech Recognition with Hugging Face from Node.js to Python #321

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Waterberry71
Copy link

@Waterberry71 Waterberry71 commented Dec 15, 2024

This PR included work completed with Edumanu82, who supported with the direct translations from Node.js to Python.

image

What does this PR do?

This PR adds a new template for "Speech Recognition with Hugging Face" which has python and node.js support now.

The Main Package includes:

  • My Setup file

    * Run "python setup.py" and it will automatically generate a database with a collection 
       and its ids as well as a bucket so you can put audios files for testing .mp3 or .wav
    * this file is also utilized to check if a database or bucket and its within items are 
       already existing or not
    
  • My AppwriteService file

    - Loads environment variables from a .env file
    - Appwrite Service Class_ is made in which the Setup file fetches
         - Has the actual functionality to produce Databases and Storage services for interacting
           with local Appwrite server
    - Initializes the AppwriteService class with an API key
    - Sets up the Appwrite client with endpoint, project ID, and API key
    - Defines a method create_Recognition_Entry
         - This creates a new space for results to be stored in the database
         - In this case, this is where the recognized speech text would be after a successful attempt
    
  • My Main File

    - THE entry point to testing the integration with automatic speech recognition method from 
      Huggingface and my Appwrite local server
         -Requires the mock test provided here to actually execute
    - Initializes necessary services and starts the audio processing workflow
    - This file is to ensure all necessary configurations and services are set up before processing audio
    
  • Added Utility File

     -Ensures that all required keys are present in a given dictionary-like object
     -Prevents runtime errors by validating the presence of necessary fields before proceeding
     -Checks if each key is present in the object
     -Collects any missing keys in a list
     -Basically provides a simple and reusable way to validate input data
           -The main file uses the utils's throw_if_missing function to validate required fields 
            before processing the audio file
    

Outside of the Main Process

  • Added .gitignore file to ignore unnecessary files.
  • Added A README.md to document the usage, configuration, and environment variables needed
  • Added requirements file
  • Added docker-compose.yml (generated when local server appwrite runs smoothly with no issues)

Test Plan

  1. Install Dependencies
    pip install appwrite
    pip install huggingface_hub
    pip install python-dotenv

  2. Run Docker for Appwrite on terminal within the parent directory of that template
    cd python/speech_recognition_with_huggingface

Source: https://appwrite.io/docs/advanced/self-hosting

docker run -it --rm
--volume /var/run/docker.sock:/var/run/docker.sock
--volume "$(pwd)"/appwrite:/usr/src/code/appwrite:rw
--entrypoint="install"
appwrite/appwrite:1.6.0

For replication purposes, make sure to use Default Recommendations when you see like (port 80, port 443, localhost, etc)

  1. After installation, use port 80 for example to sign up and create an account
    Retrieve your project id and key only

  2. Environment Setup
    Objective: Ensure all environment variables are correctly set.

    Verify .env file contains:
        APPWRITE_ENDPOINT= (Navigate towards settings in your project within the local appwrite server)
        APPWRITE_API_KEY= (Navigate towards settings in your project within the local appwrite server)
        APPWRITE_PROJECT_ID= (Create a project after you signed in to your local server)
        HUGGINGFACE_ACCESS_TOKEN= [ Create your token:
                                    Go to https://huggingface.co/docs/hub/en/security-tokens ]
        APPWRITE_DATABASE_ID= (Created when running setup.py)
        APPWRITE_COLLECTION_ID= (Created when running setup.py)
        APPWRITE_BUCKET_ID= (Created when running setup.py)
        APPWRITE_FILE_ID= [ After running setup.py, go to look at local server -> storage -> bucket 
                            -> add file -> upload -> retrieve id ]
    
  3. Use this request to execute main.py in order to get things running

Mock Test for this template (WIP)

import asyncio
import json

class MockRequest:
def __init__(self, method, body_json, headers):
      self.method = method
      self.body_json = body_json
      self.headers = headers

class MockResponse:
  def json(self, data, status=200):
      data = json.dumps(data, indent=4)
      print(f"Response: {status}, Data: {data}")
      return data
      
req = MockRequest(
  "POST",
  {"fileId": "Enter your APPWRITE_FILE_ID", "bucketId": "speech_recognition"},
  {"x-appwrite-key": "Put your appwrite secret key here"}
)

res = MockResponse()
log = print
error = print

asyncio.run(process_audio(req, res, log, error))

Test Result (WIP):
image

My Hypothesis:
Perhaps the content of my .wav file is corrupted, make sure the test file you are using is supported
There could also be an issue in File Retrieval

I will continue debugging.

PR related

The structure of the main operation files here were used as reference in Waterberry's object detection with hugging face template pull request.

At least for this and for efficiency, this was possible because they are both using the same template API with a difference of specificity.

Have you read the Contributing Guidelines on issues?

Thoroughly yes.

Resources

Created my cited guide for navigating how to run Appwrite locally
Received feedback from team afterwards:
https://docs.google.com/document/d/1uPj4TdY5sdGFFG8uy-g47OhRXsYx1DoBchHDZu6cM2E/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant