Skip to content

naplab/AAD-MovingSpeakers

Repository files navigation

AAD-MovingSpeakers

Overview

This repository supports our research paper titled "Brain-controlled augmented hearing for spatially moving conversations in noisy environments". The main components of this repository are:

  1. Binaural Speech Separation Algorithm: Separates the speech streams of the moving talkers while preserving their location.
  2. Auditory Attention Decoding (AAD): Decodes to which talker the listener is attending to by analyzing their brain signals.

🚨 Notice: The research paper has not yet been made public. This code is currently intended for paper review purposes only.

1. Separating Moving Speakers in a Sound Mixture

This section provides data, code for training separation models, pre-trained models, and a demo for inference.

Pre-requisites

  • Ensure you have installed all the dependencies listed in the requirements.txt file.
  • This codebase is tested on Python version 3.9.16.

Datasets

Training

We train both a separation model, a post-enhancement model, and a localization model separately.

Separation Model

  • After downloading the pre-generated moving speaker audio and noise audio, set up the dataset:
    python create_separation_dataset.py
  • Train the separation model:
    python train_separation_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
    

Post-enhancement model

  • After training the separation model, please use it to separate speakers and create the dataset for training the enhancement model.
  • Kick off the enhancement model training:
    python train_enhancement_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
    

Trajectory prediction model

  • The localization model is used to predict the locations (moving trajectory) of the separated speaker.
  • After training the enhancement model, please use it to get enhanced separated speech and create the dataset for training the localization model.
  • Train the localization model training:
    python train_localization_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
    

2. Auditory Attention Decoding (AAD)

This section contains resources and code for conducting AAD and relevant analyses.

Training CCA models:

Evaluating the CCA models:

  • The script Step_15_Spec_SS_WinByWin_PCA_CCA_FINAL.m is used to evaluate the CCA models for various window sizes on a window-by-window basis and also generate correlations of the brain waves with the attended and unattended stimuli.

We use the CCA implementation from the NoiseTools package developed by Dr. Alain de Cheveigné:

de Cheveigné, A., Wong, DDE., Di Liberto, GM, Hjortkjaer, J., Slaney M., Lalor, E. (2018) Decoding the auditory brain with canonical correlation analysis. NeuroImage 172, 206-216, https://doi.org/10.1016/j.neuroimage.2018.01.033.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published