This package offers a series of ROS services that help the robot to record an audio, convert it to text and make some basic questions and answers for the other tools. We are part of SinfonIA Uniandes
Table of Contents
- Linux Ubuntu 20.04
- ROS Melodic/Noetic
- Python >= 3.8
First of all, you must run these commands on the terminal.
sudo apt update && sudo apt install ffmpeg
sudo apt-get install portaudio19-dev
sudo apt install ffmpeg
Next, to install everything use pip install -r requirements.txt. That will install the following libraries:
- ffmpeg==1.4
- git+https://github.com/openai/whisper.git
- nltk==3.8.1
- openai==1.2.4
- openpyxl==3.1.2
- pandas==2.0.2
- sounddevice==0.4.6
- soundfile==0.12.1
- spacy==3.5.2
- SpeechRecognition==3.10.0
- vosk==0.3.45
These packages should be on the src of the workspace.
- audio_common_msgs audio_common_msgs
git clone https://github.com/ros-drivers/audio_common.git
- naoqi_bridge_msgs naoqi_bridge_msgs
git clone https://github.com/ros-naoqi/naoqi_bridge_msgs.git
- robot_toolkit_msgs robot_toolkit_msgs
git clone https://github.com/SinfonIAUniandes/robot_toolkit_msgs.git
- Clone the repositories (speech_utilities and speech_msgs) to the src folder of the workspace (in the same folder that the other ROS Packages).
git clone https://github.com/SinfonIAUniandes/speech_utilities.git
git clone https://github.com/SinfonIAUniandes/speech_msgs.git
- Move to the root of the workspace and build the workspace.
cd ..
catkin_make
source devel/setup.bash
When roscore is available run:
rosrun speech_utilities speech_utilities.py
Speech_unite offers the following services:
- Description: This service allows the robot to say the input of the service.
- Service file: talk_srv.srv
- Request:
- key (string): Indicates the phrase that the robot must say.
- language (string): Indicates the language which robot will speak. Could be 'English' or 'Spanish'.
- wait (bool): Indicates if the robot should wait to shoot down the service.
- animated (bool): Indicates if the robot should make gestures while talking.
- talk_speed (string): Indicates the speech speed the robot will talk between 50-400 (default: 100).
- Response:
- result (string): Indicates what the robot is talking.
- Request:
- Call service example:
rosservice call /speech_utilities/talk_srv "key: 'Hello my name is Nova.' language: 'English' wait: false animated: false talk_speed: '85'"
- Description This service allows the robot to say some questions pre established, start recording the audio throw the save_audio_srv and return an answer with the whisper and data.xlsx loaded in data folder.
- Service file: q_a_speech_srv.srv
- Request:
- tag (string): Indicates the key word for the question that the robot will say. For example: 'birth' if for 'When is your birthday?'. Allowed keys: name, age, drink, gender. Must be in lowercase.
- Response:
- answer (string): Indicates what Pepper ask for (the question).
- Request:
- Call service example:
rosservice call /speech_utilities/q_a_speech_srv "tag: 'age'"
- Description This service allows the robot to returns the transcription of the audio from the microphone.
- Service file: speech2text_srv.srv
- Request:
- duration (int32): Duration of the recording in seconds. If 0, the recording will be stopped when the person stops talking.
- Response:
- transcription (string): Transcription of the audio.
- Request:
- Call service example:
rosservice call /speech_utilities/speech2text_srv "duration: 0"
- Description Returns the silence threshold of the audio from the microphone.
- Service file: calibrate_srv.srv
- Request:
- duration (int32): Duration of the recording in seconds.
- Response:
- threshold (float64): Silence threshold.
- Request:
- Call service example:
rosservice call /speech_utilities/calibrate_srv "duration: 5"
- Description This service allows the robot to answer a question using a OpenAI model.
- Service file: answer_srv.srv
- Request:
- question (string): Indicates the question to solve.
- save_conversation (bool): If true, the conversation will be saved and the model will answer regarding previous questions.
- temperature (float64): (0-1) the higher the temperature, the more random the answer.
- system_msg (string): Message to be added to the content of system in the conversation.
- Response:
- answer (string): Indicates the answer of the question.
- Request:
- Call service example:
rosservice call /speech_utilities/answer_srv "question: 'Who discover America?' language: 'en'"
- Description This service allows the robot to answer a question using a Google API or a own model.
- Service file: hot_word_srv.srv
- Request:
- hot_words (list[String]): List of hot words to detect.
- eyes (bool): If true, the eyes will be activated.
- sound (bool): If true, the sound will be activated.
- threshold (float64): Threshold to detect the hot words.
- Response:
- response (bool): If true, the hot word service started publishing the hot words. If false, the service was turned off or there is no Toolkit.
- Request:
- Call service example:
rosservice call /speech_utilities/hot_word_srv "hot_words: ['palabra1', 'palabra2', 'palabra3'] noise: false eyes: true threshold: 0.5"