voicebox package

Subpackages

Submodules

voicebox.audio module

class voicebox.audio.Audio(signal: ndarray, sample_rate: int)[source]

Bases: object

Represents an audio signal.

Parameters:

signal – Audio signal represented as a 1D array of samples, each in the range [-1, 1].
sample_rate – Number of samples per second.

check() → None[source]

Raises ValueError if the audio is invalid.

For an audio to be valid, it must satisfy the following conditions:

Must have at least one sample.
All samples must be in the range [-1, 1].
The sample rate must be greater than 0.

copy(signal: ndarray = None, sample_rate: int = None) → Audio[source]: Returns a deep copy of self, with optional new property values.

property len_bytes: int: Length of audio signal in bytes.

property len_seconds: float: Length of audio signal in seconds.

property sample_period: float: Sample period in seconds.

sample_rate: int

signal: ndarray

voicebox.ssml module

class voicebox.ssml.SSML[source]

Bases: str

A Speech Synthesis Markup Language (SSML) string.

By wrapping a string in this class, the string is treated as SSML by TTS engines that support it.

Example

>>> from voicebox.tts import ESpeakNG
>>> from voicebox import SSML
>>> tts = ESpeakNG()
>>> text = SSML('<speak>Hello world</speak>')
>>> audio = tts.get_speech(text)

classmethod auto(text: str) → SSML | str[source]

Returns the text as SSML if it starts with '<speak>', otherwise returns the text unaltered.

Example

>>> from voicebox import SSML
>>> SSML.auto('<speak>Hello world</speak>')
SSML('<speak>Hello world</speak>')
>>> SSML.auto('Hello world')
'Hello world'

voicebox.types module

voicebox.utils module

voicebox.utils.reliable_tts(ttss: TTS | Iterable[TTS] = None, retry_max_attempts: int = 3, cache_max_size: int | float = 60, cache_size_func: Literal['bytes', 'count', 'seconds'] | Callable[[Any], int | float] = 'seconds') → TTS[source]

Takes zero or more TTS instances and returns a single TTS that will attempt to use each TTS in the order given, up to retry_max_attempts times each, until one succeeds. Outputs will also be cached to speed up retrieval of repeated phrases.

This is useful if e.g. you have an online TTS that is subject to network failures, which the retries may alleviate, and you want to fall back to an offline TTS in the event that the online TTS fails all attempts.

If no TTS instance is provided, then a default TTS instance will be used.