voiceboxο
Python text-to-speech library with built-in voice effects and support for multiple TTS engines.
| GitHub | Documentation π | Audio Samples π |
# Example: Use gTTS with a vocoder effect to speak in a robotic voice
from voicebox import SimpleVoicebox
from voicebox.tts import gTTS
from voicebox.effects import Vocoder, Normalize
voicebox = SimpleVoicebox(
tts=gTTS(),
effects=[Vocoder.build(), Normalize()],
)
voicebox.say('Hello, world! How are you today?')
Setupο
pip install voicebox-ttsInstall the
PortAudiolibrary for audio playback.On Debian/Ubuntu:
sudo apt install libportaudio2
Install dependencies for whichever TTS engine(s) you want to use (see section below).
Supported Text-to-Speech Enginesο
Classes for supported TTS engines are located in the
voicebox.tts package.
Amazon Polly πο
Online TTS engine from AWS.
Class:
voicebox.tts.AmazonPollySetup:
pip install "voicebox-tts[amazon-polly]"
ElevenLabs πο
Online TTS engine with realistic voices and support for voice cloning.
Class:
voicebox.tts.ElevenLabsTTSSetup:
pip install "voicebox-tts[elevenlabs]"Get an API key.
Minimal example:
from voicebox.tts import ElevenLabsTTS
vb = SimpleVoicebox(tts=ElevenLabsTTS(
voice_id="JBFqnCBsd6RMkjVDRZzb",
api_key="...",
))
eSpeak NG πο
Offline TTS engine with a good number of options.
Class:
voicebox.tts.ESpeakNGSetup:
On Debian/Ubuntu:
sudo apt install espeak-ng
Google Cloud Text-to-Speech πο
Powerful online TTS engine offered by Google Cloud.
Class:
voicebox.tts.GoogleCloudTTSSetup:
pip install "voicebox-tts[google-cloud-tts]"
gTTS πο
Online TTS engine used by Google Translate.
Class:
voicebox.tts.gTTSSetup:
pip install "voicebox-tts[gtts]"Install ffmpeg for audio decoding.
π€ Parler TTS πο
Offline TTS engine released by Hugging Face that uses a promptable deep learning model to generate speech.
Class:
voicebox.tts.ParlerTTSSetup:
pip install git+https://github.com/huggingface/parler-tts.git
Pico TTSο
Very basic offline TTS engine.
Class:
voicebox.tts.PicoTTSSetup:
On Debian/Ubuntu:
sudo apt install libttspico-utils
pyttsx3 πο
Offline TTS engine wrapper with support for the built-in TTS engines on Windows (SAPI5) and macOS (NSSpeechSynthesizer), as well as espeak on Linux. By default, it will use the most appropriate engine for your platform.
Class:
voicebox.tts.Pyttsx3TTSSetup:
pip install "voicebox-tts[pyttsx3]"On Debian/Ubuntu:
sudo apt install espeak
Voice.AI πο
Online TTS engine with realistic voices and support for voice cloning.
Class:
voicebox.tts.VoiceAiTTSSetup:
pip install "voicebox-tts[voice-ai]"Get an API key: https://voice.ai/app/dashboard/developers
Minimal example:
from voicebox.tts import VoiceAiTTS
vb = SimpleVoicebox(tts=VoiceAiTTS(api_key="..."))
Effectsο
Built-in effect classes are located in the
voicebox.effects package,
and can be imported like:
from voicebox.effects import CoolEffect
Here is a non-exhaustive list of fun effects:
Glitchcreates a glitchy sound by randomly repeating small chunks of audio.RingModcan be used to create choppy, Doctor Who Dalek-like effects.Vocoderis useful for making monotone, robotic voices.
There is also support for all the awesome audio plugins in
Spotifyβs pedalboard library
using the special PedalboardEffect
wrapper, e.g.:
from voicebox import SimpleVoicebox
from voicebox.effects import PedalboardEffect
import pedalboard
voicebox = SimpleVoicebox(
effects=[
PedalboardEffect(pedalboard.Reverb()),
...,
]
)
Examplesο
Minimalο
# PicoTTS is used to say "Hello, world!"
from voicebox import SimpleVoicebox
voicebox = SimpleVoicebox()
voicebox.say('Hello, world!')
Pre-builtο
Some pre-built voiceboxes are available in the
voicebox.examples package.
They can be imported into your own code, and you can run them to demo:
# Voice of GLaDOS from the Portal video game series
python -m voicebox.examples.glados "optional message"
# Voice of the OOM-9 command battle droid from Star Wars: Episode I
python -m voicebox.examples.battle_droid "optional message"
Advancedο
# Use eSpeak NG at 120 WPM and en-us voice as the TTS engine
from voicebox import reliable_tts
from voicebox.tts import ESpeakConfig, ESpeakNG, gTTS
# Wrap multiple TTSs in retries and caches
tts = reliable_tts(
ttss=[
# Prefer using online TTS first
gTTS(),
# Fall back to offline TTS if online TTS fails
ESpeakNG(ESpeakConfig(speed=120, voice='en-us')),
],
)
# Add some voice effects
from voicebox.effects import Vocoder, Glitch, Normalize
effects = [
Vocoder.build(), # Make a robotic, monotone voice
Glitch(), # Randomly repeat small sections of audio
Normalize(), # Remove DC and make volume consistent
]
# Build audio sink
from voicebox.sinks import Distributor, SoundDevice, WaveFile
sink = Distributor([
SoundDevice(), # Send audio to playback device
WaveFile('speech.wav'), # Save audio to speech.wav file
])
# Build the voicebox
from voicebox import ParallelVoicebox
from voicebox.voiceboxes.splitter import SimpleSentenceSplitter
# Parallel voicebox doesn't block the main thread
voicebox = ParallelVoicebox(
tts,
effects,
sink,
# Split text into sentences to reduce time to first speech
text_splitter=SimpleSentenceSplitter(),
)
# Speak!
voicebox.say('Hello, world!')
# Wait for all audio to finish playing before exiting
voicebox.wait_until_done()
Command Line Demoο
python -m voicebox -h # Print command help
python -m voicebox "Hello, world!" # Basic usage
Contents