Introducing Song Creator Pro — create music with AI, locally on your device. Try it now →

Clone a Voice

A step-by-step tutorial for cloning any voice in Voice Creator Pro, from recording a reference sample to generating speech.

Clone a Voice

In this tutorial you will clone a voice from scratch and use it to generate speech. By the end, you will have a saved voice in your library and a generated audio clip that sounds like the original speaker.

The entire process takes under two minutes.


Prerequisites

  • Voice Creator Pro installed and running on your machine
  • A microphone (if recording your own voice) or an audio file / YouTube link with the voice you want to clone

Step-by-Step Walkthrough

Step 1: Open the Clone Tab

Open Voice Creator Pro and click Lab in the left sidebar, then select the Clone tab at the top. You will see three panels:

  • Reference Voice (left) - where you load or record the source voice
  • Generate Speech (center) - where you enter text and pick a model
  • Output (right) - where you listen to and download results

Step 2: Add a Reference Voice

You have four options. Pick whichever fits your situation:

Option A: Record from microphone Click the microphone icon in the Reference Voice panel and speak for 3 to 10 seconds. Keep it clean and natural. A quiet room makes a big difference.

Option B: Upload an audio file Click the upload icon and select a WAV or MP3 file from your computer.

Option C: Import from YouTube Click the YouTube icon, paste a URL, then click Fetch. Preview the extracted clip and click Use this clip to load it.

Option D: Browse Voice Search Open Voice Search from the sidebar, browse the community library, and click Import to Clone Library on any voice you like. It will appear in your Clone Library dropdown instantly.

Keep your clip under 15 seconds. We recommend 3 to 10 seconds of clean speech. Longer audio does not produce a better clone. It can actually harm quality and takes much longer to process. Extremely long clips can cause the application to malfunction. Learn more about it in this guide How Many Minutes of Audio Do You Need for Voice Cloning?.

Step 3: Verify the Transcript

After you load a reference voice, the built-in speech-to-text model fills in the Transcript field automatically.

This step is critical. Read the transcript and compare it word-for-word with what the speaker actually says. Fix any mistakes before moving on. Even small mismatches between the transcript and the audio will hurt clone quality.

Click Save Voice and give it a descriptive name (for example, "Sarah - warm narration"). The voice is now stored in your Library and you can reload it anytime from the Library dropdown without re-uploading audio.

Step 5: Pick a Model

Click the model badge (shown in purple) in the Generate Speech panel to choose a TTS model. Here is a quick guide:

ModelBest for
OmniVoiceWidest language support (600+), expression tags like [laughter], [sigh] ,etc.
Chatterbox MultilingualHigh-quality conversational speech in 23 languages
Chatterbox TurboFast English-only generation. To select this model, select 'Chatterbox' and then select' Lower' quality.
Qwen310 languages with different voice characteristics
NeuTTSLightweight English-only option for weaker hardware

If you are unsure, start with OmniVoice. It covers the most languages and supports expression tags.

Step 6: Enter Your Text

Type or dictate the text you want the cloned voice to say in the Text to speak field.

Keep it to a paragraph or two for quick testing. For longer scripts, use Projects instead.

Want to add personality? If you chose OmniVoice or Chatterbox, click the smiley icon to insert expression tags like [laughter] or [surprise-oh] directly into your text.

Step 7: Generate

Click the Generate button. The Output panel on the right will play the result once processing finishes.

Not happy with the output? Try these before generating again:

Step 8: Download or Keep Iterating

Click the download button in the Output panel to save the audio file. Every generation is also stored in the History section below the panels, so you can revisit and compare earlier attempts.


Tips for Best Results

  • Keep reference audio short. 3 to 10 seconds of clean speech is the sweet spot. Longer clips do not improve quality and can make it worse. See How Many Minutes of Audio Do You Need for Voice Cloning? for the details.
  • Minimize background noise. Record in a quiet room or use a well-isolated clip. Background music, echo, and ambient noise all degrade the clone.
  • Match the transcript exactly. This is the single most common cause of poor results. Double-check it every time.
  • Experiment with models. Each model handles tone, pacing, and accents differently. Try two or three on the same text to find your favorite.
  • Use advanced settings sparingly. The defaults work well for most cases. If you do tweak them, change one setting at a time so you can hear the difference.

Next Steps