What is Eleven v3?

Eleven v3 is our latest and most expressive Text to Speech model, offering:

  • More human-like generations with higher quality overall
  • Support for audio tags
    • emotions: [sad] [angry] [happily]
    • delivery direction: [whispers] [shouts]
    • non-verbal reactions: [laughs][clears throat] [sighs]
  • Dialogue mode to support natural sounding audio with multiple speakers
  • Support for 70+ languages

It can produce breathtaking output, but its more variable consistency and higher latency mean it’s not suitable for real-time or conversational use cases. For those, we recommend the v2/v2.5 Turbo or Flash models. We’re working on a real-time version of Eleven v3.

You can generate using v3 via API using our Create speech and Stream speech endpoints by specifying model ID eleven_v3.

You can also use our Create dialogue and Stream dialogue endpoints to create a natural sounding dialogue with multiple speakers.

Visit the following resources for more information: