Skip to content

My design of a simple KWS (Key Word Spotting) hardware accelerator designed by generative AI for efabless 4th GenAI design contest.

License

Notifications You must be signed in to change notification settings

fayizferosh/KWS-ha

Repository files navigation

MelKWS_Engine

MelKWS Engine

Static Badge Static Badge Static Badge GitHub last commit GitHub language count GitHub top language GitHub repo size GitHub code size in bytes GitHub repo file count (file type) License UPRJ_CI Caravel Build

A simple and resource efficient hardware accelerator designed specifically for Keyword Spotting (KWS) applications using log-mel spectrograms as the audio feature extractor.

Architecture

Description

  1. Input:

    • The input audio stream is sampled at a specific frequency, such as 16 kHz.
    • Each audio frame consists of a fixed number of samples.
  2. Log-Mel Spectrogram Computation:

    • Implement a lightweight log-mel spectrogram computation module to extract features from the input audio stream.
  3. Keyword Detection:

    • The accelerator should detect the presence or absence of a single predefined keyword or command based on the computed log-mel spectrograms.
  4. Output:

    • Provide a mechanism to indicate the presence or absence of the keyword in the input audio stream.
    • Output a binary flag signal indicating the presence or absence of the keyword.

Architecture Choice

  1. Input Interface:
    • Purpose: Handles incoming audio samples, ensuring they are correctly timed and formatted for processing.
    • Components:
      • Sample buffer: Temporarily stores incoming audio samples.
      • Control logic: Manages the flow of samples based on system state and input validity.
  2. Pre-processing:
    • Purpose: Applies necessary pre-processing steps to the audio samples, such as framing and windowing.
    • Components:
      • Frame buffer: Segments the continuous audio stream into overlapping frames.
      • Window function: Applies a windowing function to each frame to minimize spectral leakage.
  3. FFT Module:
    • Purpose: Converts time-domain audio frames into frequency-domain representations using the Fast Fourier Transform (FFT).
    • Components:
      • FFT processor: Computes the FFT of each windowed frame.
  4. Mel Filterbank Processing:
    • Purpose: Applies a set of Mel-scaled filters to the FFT output to extract frequency bands that mimic human auditory perception.
    • Components:
      • Filterbank: A collection of band-pass filters corresponding to the Mel scale.
      • Energy computation: Calculates the energy in each Mel band.
  5. Feature Extraction:
    • Purpose: Optionally extracts additional features from the Mel spectrogram, such as MFCCs (Mel Frequency Cepstral Coefficients), if required by the keyword detection logic.
    • Components:
      • Feature extractor: Calculates MFCCs or other features from the Mel spectrogram.
  6. Dynamic Precision Adjustment:
    • Purpose: Adjusts the precision of the FFT or Mel spectrogram data to optimize for computational efficiency or resource usage.
    • Components:
      • Precision control: Dynamically adjusts data bit-width based on configurable criteria.
  7. Logarithmic Compression:
    • Purpose: Applies logarithmic compression to the Mel spectrogram to better match the non-linear perception of loudness in the human auditory system.
    • Components:
      • Logarithmic function: Computes the logarithm of Mel spectrogram values.
  8. Keyword Detection Logic:
    • Purpose: Analyzes the log-Mel spectrogram (and possibly additional features) to detect the presence of specific keywords.
    • Components:
      • Detection algorithm: Implements a simple thresholding or a more complex pattern matching/machine learning algorithm to identify keywords.
      • Keyword selector: Allows dynamic selection of the keyword(s) to be detected.
  9. Output Interface:
    • Purpose: Indicates the detection result, such as the presence of a keyword.
    • Components:
      • Detection output: Signals when a keyword has been detected.
      • Status indicators: Provide additional information about the detection process, such as confidence levels.
  10. Integration and Control (Top):
    • System Controller: Coordinates the operation of all stages, managing state transitions, processing flow, and synchronization.
    • Clock and Reset Management: Ensures all components operate synchronously and can be reset to a known state.
❗ Important Note

Forked from the Caravel User Project

Refer to README for a quickstart of how to use caravel_user_project

Refer to README for this sample project documentation.

Refer to the following readthedocs for how to add cocotb tests to your project.

About

My design of a simple KWS (Key Word Spotting) hardware accelerator designed by generative AI for efabless 4th GenAI design contest.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published