What is Speech-to-speech?

  • Updated

Speech-to-speech (STS), or voice conversion, allows you to convert one voice (source voice) into another (cloned voice) while preserving the tone and delivery of the original voice.

The possibilities are endless. It can be used to complement our text-to-speech (TTS) features by fixing pronunciations or infusing that special performance you've been wanting. It can also be used to help extend the range of voice actors by giving them access to a pantry of different voices and tones. We do offer an end-to-end solution for dubbing, but if you still want to dub in the traditional way, you can take advantage of speech-to-speech to help get the right voice for your project.

STS costs 1,000 credits per minute of audio. 

Via the website, the maximum length of audio that can be converted is 5 minutes.  You can convert audio up to 10 minutes using the Multilingual V2 model via the API.

For more information, please see our guide to Speech-to-speech.