Gradio

Upload a speech of 3~10 seconds as the audio prompt and type in the text you'd like to synthesize.
The model will synthesize speech of given text with the same voice of your audio prompt.
The model also tends to preserve the emotion & acoustic environment of your given speech.
For faster inference, please use "Make prompt" to get a .npz file as the encoded audio prompt, and use it by "Infer from prompt"

Text

language

accent

Transcript

uploaded audio prompt

recorded audio prompt

Message

Output Audio

Prompt name

File

Examples

Text	language	accent	uploaded audio prompt	Transcript

Upload a speech of 3~10 seconds as the audio prompt.
Get a .npz file as the encoded audio prompt. Use it by "Infer with prompt"

Prompt name

Transcript

uploaded audio prompt

recorded audio prompt

Message

File

Examples

Prompt name	uploaded audio prompt	Transcript

Faster than "Infer from audio".
You need to "Make prompt" first, and upload the encoded prompt (a .npz file)

Text

language

accent

Voice preset

File

Message

Output Audio

Examples

Text	language	accent	Voice preset

Very long text is chunked into several sentences, and each sentence is synthesized separately.
Please make a prompt or use a preset prompt to infer long text.

Text

language

accent

Voice preset

File

Message

Output Audio

VALL-E X