The bottom line is: good consistent input = good consistent output.
- Instant Voice Cloning: 1 - 2 minutes of good audio
- Professional Voice Cloning: 30 - 180 minutes of good audio
Use the best and clearest audio clips that you can find. There should only be one speaker without background noise of interference and their voice should be loud and clear.
Instead of using many clips of different quality just to increase the length, prioritize clips where the microphone quality is obviously very high and where the quality and tone is consistent throughout, rather than focusing on increasing the total runtime.
Ensure that most of the dialogue in your clips aligns with the speaker's speaking style and intonation that you prefer the most. You don't want too many chunks of dialogue where the speaker deviates from the desired speech patterns you want to hear.
If necessary, use a noise remover to reduce any background noise.
You can find more information in our documentation here.