Skip to content

Voice Assistant

Piotr Skowronek edited this page Jul 4, 2020 · 7 revisions

Concept

The idea was to use open-source voice recognition software that is able to run offline. I found PocketSphinx very promising. Raspberry Pi Zero CPU is not so powerful and AFAIK it has only one core, so I decided to use very simple acoustic model (HMM) and generate custom LM & Dict using lmtool online tool for only number of words.

Thanks to all that voice recognition and pulseaudio use <20% of RPi CPU (10% of 512M RAM). The recognition isn't perfect:

  • there might be lots of false positives for wake word (some tweaking of thresholds is required in invocation.list)
  • I'm physically incapable of saying 'Hey Cybill' in a way it could understand

An installation script is provided to automate the setup, download acoustic model, build LM & DICT files.

Adjusting assistant

You can adjust the words and commands by editing the following files:

  • config.yaml
    • this is a main configuration of the assistant - it contains paths to acoustic model, dicts, lm, and a definition of sentences and commands it should execute. Multiple invocation words/sentences can be defined, each can answer with different tone (male, female etc, thanks to espeak command)
  • invocation.list - the list of invocation words/sentences *)
  • keyphrase.list - the list of sentences (different variants) *)

*) Both files will be concatenated together and sent to lmtool to generate lm & dict. Make sure to write those sentences in UPPER case (don't ask me why, I dunno).

Writing scripts for a command

Take a look in this folder - pick one of cmd_*.sh as an example and write your own. Mind that those commands are executed in blocking mode (assistant waits until they end up, this is to let the script to decide when to hand over the control to assistant). Use & to go into background if you need and know what you're doing.

Hardware

You must get a USB soundcard that has microphone input and audio output. You may also try to use tiny USB microphones and a separate USB audio output (this can be a good idea btw - read on). The thing is that some cheapest USB soundcards introduce a lot of noise especially when mike is on - and even upgrading them with capacitors won't help that much (at least it didn't work for me, noise is being heard in speaker and on recording when mike is on).

So, I ordered another one but with good reviews and it is much much better - no noise, the mike's gain is great, it is something like this:

For output I used a tiny 0.1W 8ohm speaker (MG15) - but since the stereo output from soundcard is too weak, you need to use an amp module (like this one, take 5V from either USB or from RPi, ideally if this voltage was regulated and filtered).

When you have them connected to RPi, open sound preferences UI (or console one - for example alxamixer) and adjust the volume and mike gain. The mike's gain must not be too high to avoid clipping for louder sounds and PocketSphinx will have problems understanding words. The volume settings should persist after reboots.

Clone this wiki locally