This project allows users to install OpenVoice, an AI speech model that can clone voices, on RunPod serverless platform.
Docker image available at: Docker Hub
To run this application on RunPod serverless, you need to set the following environment variables:
BUCKET_ENDPOINT_URL
: The endpoint URL of your S3-compatible storage.BUCKET_ACCESS_KEY_ID
: The access key ID for your S3-compatible storage.BUCKET_SECRET_ACCESS_KEY
: The secret access key for your S3-compatible storage.
These variables are required to store and host the generated WAV files.
git clone https://github.com/drvpn/runpod_serverless_openvoice_worker.git
cd runpod_serverless_openvoice_worker
-
Build and Push Docker Image
- Follow RunPod's documentation to build and push your Docker image to a container registry.
-
Deploy on RunPod
- Go to RunPod's dashboard and create a new serverless function.
- Use the Docker image you pushed to your container registry.
- Set the environment variables:
BUCKET_ENDPOINT_URL
,BUCKET_ACCESS_KEY_ID
,BUCKET_SECRET_ACCESS_KEY
.
-
Invoke the Function
You can invoke the function with a JSON payload specifying the text, language, and voice URL. Here is an example:
{
"input": {
"text": "Hello, world!",
"voice_url": "https://example.com/path/to/voice.mp3",
"language": "EN",
"speed": 1.0
}
}
Use RunPod's interface or an HTTP client to send this payload to the deployed function.
text
: The text the AI will transcribevoice_url
: A URL to a wav file. This file should contain spoken words recorded in a quite environment. This will become the voice of the speaker.language
: The language the speaker will use when transcribing your text. Choose on of the following ['EN
', 'EN-AU
', 'EN-BR
', 'EN-INDIA
', 'EN-US
', 'EN-DEFAULT
', 'ES
', 'FR
', 'ZH
', 'JP
', 'KR
']speed
: Speed is the pace the speaker will use when speaking.
text
: requiredno default
voice_url
: requiredno default
language
: default value isEN
speed
: default value is1.0
To override default values, you can set the following (optional) environment variables:
DEFAULT_TEXT
: sets new default for textDEFAULT_LANGUAGE
: sets new default for languageDEFAULT_VOICE_URL
: Sets new default for voice_urlDEFAULT_SPEED
: Sets new default for speed
{
"delayTime": 789,
"executionTime": 16608,
"id": "your-unique-id-will-be-here",
"output": {
"output_audio_url": "https://mybucket.nyc3.digitaloceanspaces.com/OpenVoice/OpenVoice_20240613_213640_i7bzrf_32f210.wav"
},
"status": "COMPLETED"
}
The handler.py
script orchestrates the following tasks:
- Maps a network volume to store checkpoints (if available).
- Downloads and caches model checkpoints if not already present.
- Converts text to speech with the supplied (cloned) voice.
- Uploads the generated audio file to S3-compatible storage and returns the public URL.
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT License.