Currently, we offer two choices for cloning:
1. Instant Voice Cloning (IVC): IVC is less resource-intensive and provides instant results that you can use immediately. This method is swift, requiring only about 1 to 3 minutes of audio input for a high-quality clone, and is often ideal for most general uses but might have trouble with unique voices or accents.
2. Professional Voice Cloning (PVC): PVC demands significantly more resources and you are required to provide the AI with a substantial amount of data (between a minimum of 30 minutes and closer to 3 hours for optimal results). This process involves fine-tuning the model using the provided dataset to create a customized model. The estimated training time is roughly 3-8 hours, but the process may take longer depending on how many other voices are queued for fine-tuning.
If you have a rather unique voice with a less common accent, instant voice cloning might not provide a perfect replication of your voice. Then the only way to achieve something like that might be through professional voice cloning. Instant voice cloning is generally very accurate, but under certain circumstances, such as those mentioned above, you might have to resort to professional voice cloning to obtain the most perfect clone.
Unfortunately, there is no way to influence the accent or tone of the clone after the clone has already been created; the only way to influence it is to change the actual samples you use for cloning. Just small changes to the samples can make a big difference.