Eleven v3 is our latest and most expressive Text to Speech model, offering:
- More human-like generations with higher quality overall
- Support for audio tags
- emotions: [sad] [angry] [happily]
- delivery direction: [whispers] [shouts]
- non-verbal reactions: [laughs][clears throat] [sighs]
- Dialogue mode to support natural sounding audio with multiple speakers
- Support for 70+ languages
It can produce breathtaking output, but its more variable consistency and higher latency mean it’s not suitable for real-time or conversational use cases. For those, we recommend the v2/v2.5 Turbo or Flash models. We’re working on a real-time version of Eleven v3.
You can generate using v3 via API using our Create speech and Stream speech endpoints by specifying model ID eleven_v3.
You can also use our Create dialogue and Stream dialogue endpoints to create a natural sounding dialogue with multiple speakers.
Visit the following resources for more information: