What is Eleven v3?

Eleven v3 is our latest and most expressive Text to Speech model, offering:

More human-like generations with higher quality overall
Support for audio tags
- emotions: [sad] [angry] [happily]
- delivery direction: [whispers] [shouts]
- non-verbal reactions: [laughs][clears throat] [sighs]
Dialogue mode to support natural sounding audio with multiple speakers
Support for 70+ languages

It can produce breathtaking output, but its more variable consistency and higher latency mean it’s not suitable for real-time or conversational use cases. For those, we recommend the v2/v2.5 Turbo or Flash models. We’re working on a real-time version of Eleven v3.

You can generate using v3 via API using our Create speech and Stream speech endpoints by specifying model ID eleven_v3.

You can also use our Create dialogue and Stream dialogue endpoints to create a natural sounding dialogue with multiple speakers.

Visit the following resources for more information:

Eleven v3 overview
Eleven v3 prompting guide