A step-by-step tutorial for cloning any voice in Voice Creator Pro, from recording a reference sample to generating speech.

Clone a Voice

In this tutorial you will clone a voice from scratch and use it to generate speech. By the end, you will have a saved voice in your library and a generated audio clip that sounds like the original speaker.

The entire process takes under two minutes.

Prerequisites

Voice Creator Pro installed and running on your machine
A microphone (if recording your own voice) or an audio file / YouTube link with the voice you want to clone

Step-by-Step Walkthrough

Step 1: Open the Clone Tab

Open Voice Creator Pro and click Lab in the left sidebar, then select the Clone tab at the top. You will see three panels:

Reference Voice (left) - where you load or record the source voice
Generate Speech (center) - where you enter text and pick a model
Output (right) - where you listen to and download results

Step 2: Add a Reference Voice

You have four options. Pick whichever fits your situation:

Option A: Record from microphone Click the microphone icon in the Reference Voice panel and speak for 3 to 10 seconds. Keep it clean and natural. A quiet room makes a big difference.

Option B: Upload an audio file Click the upload icon and select a WAV or MP3 file from your computer.

Option C: Import from YouTube Click the YouTube icon, paste a URL, then click Fetch. Preview the extracted clip and click Use this clip to load it.

Option D: Browse Voice Search Open Voice Search from the sidebar, browse the community library, and click Import to Clone Library on any voice you like. It will appear in your Clone Library dropdown instantly.

Keep your clip under 15 seconds. We recommend 3 to 10 seconds of clean speech. Longer audio does not produce a better clone. It can actually harm quality and takes much longer to process. Extremely long clips can cause the application to malfunction. Learn more about it in this guide How Many Minutes of Audio Do You Need for Voice Cloning?.

Step 3: Verify the Transcript

After you load a reference voice, the built-in speech-to-text model fills in the Transcript field automatically.

This step is critical. Read the transcript and compare it word-for-word with what the speaker actually says. Fix any mistakes before moving on. Even small mismatches between the transcript and the audio will hurt clone quality.

Step 4: Save the Voice (Optional but Recommended)

Click Save Voice and give it a descriptive name (for example, "Sarah - warm narration"). The voice is now stored in your Library and you can reload it anytime from the Library dropdown without re-uploading audio.

Step 5: Pick a Model

Click the model badge (shown in purple) in the Generate Speech panel to choose a TTS model. Here is a quick guide:

Model	Best for
OmniVoice	Widest language support (600+), expression tags like [laughter], [sigh] ,etc.
Chatterbox Multilingual	High-quality conversational speech in 23 languages
Chatterbox Turbo	Fast English-only generation. To select this model, select 'Chatterbox' and then select' Lower' quality.
Qwen3	10 languages with different voice characteristics
NeuTTS	Lightweight English-only option for weaker hardware

If you are unsure, start with OmniVoice. It covers the most languages and supports expression tags.

Step 6: Enter Your Text

Type or dictate the text you want the cloned voice to say in the Text to speak field.

Keep it to a paragraph or two for quick testing. For longer scripts, use Projects instead.

Want to add personality? If you chose OmniVoice or Chatterbox, click the smiley icon to insert expression tags like [laughter] or [surprise-oh] directly into your text.

Step 7: Generate

Click the Generate button. The Output panel on the right will play the result once processing finishes.

Not happy with the output? Try these before generating again:

Switch to a different model
Re-record or trim your reference audio to a cleaner 3 to 10 second clip. See How to Pick the Right Reference Audio for Voice Cloning for guidance on choosing a good clip.
Adjust the advanced settings (click the sliders icon next to the language selector)

Step 8: Download or Keep Iterating

Click the download button in the Output panel to save the audio file. Every generation is also stored in the History section below the panels, so you can revisit and compare earlier attempts.

Tips for Best Results

Keep reference audio short. 3 to 10 seconds of clean speech is the sweet spot. Longer clips do not improve quality and can make it worse. See How Many Minutes of Audio Do You Need for Voice Cloning? for the details.
Minimize background noise. Record in a quiet room or use a well-isolated clip. Background music, echo, and ambient noise all degrade the clone.
Match the transcript exactly. This is the single most common cause of poor results. Double-check it every time.
Experiment with models. Each model handles tone, pacing, and accents differently. Try two or three on the same text to find your favorite.
Use advanced settings sparingly. The defaults work well for most cases. If you do tweak them, change one setting at a time so you can hear the difference.

Next Steps

Voice Cloning reference - Full details on every setting and model
Voice Search - Browse and import community voices
Projects - Scale up to long-form content with consistent voices
How to Pick the Right Reference Audio - Deep dive on choosing reference clips

Clone a Voice

On this page