Check out our deployed project at http://pulseaudio.duckdns.org!
For your business partners, family members, or friends, background noise in conference calls can be distracting, unprofessional, and make understanding difficult. With working remotely becoming more common, it’s critical that people can work efficiently and productively from wherever they choose.
What if we could remove the background noise, enabling workers to focus and understand exactly what their teammates are sharing or explaining? There would be fewer misunderstandings and mistakes, along with faster, more efficient communication. As a result, everyone gets time back in their schedules, and less mental fatigue from all-day video-conference meetings.
Our project aims to tackle this problem by focusing on using real-time speech enhancement to improve the quality of noisy virtual calls.
Data Sets:
Metrics:
- Perceptual Evaluation of Speech Quality (PESQ)
- Short Term Objective Intelligibility (STOI)
Use the package manager pip to install dependecies.
pip install -r requirements.txt
cd development
python -m denoiser.enhance --file_location="PATH_TO_WAV"
The enhanced audio clip will be saved in denoiser/static
.
Results on the Valentini dataset.
Model | PESQ | STOI (%) |
---|---|---|
Wiener* | 2.22 | 93 |
SEGAN** | 2.19 | 93.12 |
SASEGAN** | 2.36 | 03.32 |
Wave U-Net* | 2.40 | - |
DEMUCS | 2.96 | 94.21 |
* Results from Table 1 of Denoiser paper
** Results from Table 1 of SASEGAN paper
(Github Markdown doesn't support embeding audio, so this will have to do.)
Here we have a few side-by-side spectrogram comparisons of the results of our model. You can see that in general our model tends to perform much like a traditional gate would. In sections devoid of speech, the model effectively removes extraneous sounds. However, it also seems to be overcompensating in the higher frequency ranges during sections of speech.
Compare the 'clean' and 'enhanced' spectrograms for each example and you'll see noticeably more activity in the upper regions of speech segments (corresponding to higher frequency sounds).
In audio clips, these differences manifest as a form of digital white noise and sharpness in the higher registers.
Something else to keep an eye on is the sharpness of the spectrogram images. The "grainy" quality of the spectrogram images are indicative of broad-spectrum noise across the signal. Think: cars idling or a consistent breeze.
Looking at the final two examples in particular, you can see a clear contrast in the sharpness of the 'clean' clip as compared with the 'enhanced', indicating that the model was unable to fully eliminate broad-spectrum noise.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request