Text to Speech

Eleven v3 offers Dialogue mode, allowing you to generate dynamic multi-speaker conversations with natural pacing, that handle...

Eleven v3 supports audio tags, giving unprecedented control over your generated audio: Emotions: [curious] [crying] [mischiev...

The cost of generating with Eleven v3 is 1 credit per character on the website. API generations are discounted - see API pric...

Eleven v3 is our latest and most expressive Text to Speech model, offering: More human-like generations with higher quality o...

Files that you have generated using Text to Speech or Voice Changer can be downloaded as MP3, WAV, M4A or FLAC files. WAV, M4...

You can download the generated files in two ways:You can download a generated file immediately by clicking the download butto...

In Speech Synthesis, using the website, you can generate up to 5,000 characters in a single generation on any paid plan and ...

Language when generating via the website When you generate audio on the ElevenLabs website, our AI automatically detects the ...

Any voice can speak any language currently supported by the AI; however, if you do not use a voice that is native to the lang...

If you want to force a certain pronunciation, you can use SSML phoneme tags. We support both IPA and CMU. However, we have fo...

Mispronunciations can happen for a few different reasons. The most common one is that the word is just misspelled. The AI wil...

There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent w...

Numbers, dates, symbols and acronyms can present a challenge to the AI, as there are often multiple ways that they could be d...

Unfortunately, at this time, we do not offer download-based deduction as an alternative to generation-based deduction. There ...

Our Flash and Turbo models have been specially developed for low-latency applications.Flash v2 and Flash v2.5 are our ultra-l...

All pre-made voices and generated voices are English. This means that they might not have the correct accent or pronunciation...

Any voice can speak any of the supported languages. The way the current model works is that you don't select a specific langu...

This is currently not possible.

The model is sensitive to the wider situation surrounding each utterance - it assesses whether something makes sense by how i...

We plan on introducing features allowing emotions such as laughter in the future.

Voice speed control is available for Text to Speech via Speech Synthesis, Studio, ElevenAgents and our API.You can control th...