How to Generate Long-Form Narration

Walk through producing podcasts, article narrations, video voiceovers, and e-learning content in Voice Creator Pro using Projects.

This tutorial walks you through producing long-form audio such as podcast episodes, article narrations, social media voiceovers, and e-learning content. You will prepare voices in the Lab, import a script into Projects, assign voices, fine-tune individual segments, and export a finished audio file.

Looking to create an audiobook with chapter markers instead? See the audiobook tutorial.

What You Will Need

Voice Creator Pro, running either as the desktop app on your machine, or in your browser with VCP Cloud
A script or document for your content (TXT, DOCX, PDF, or text in your clipboard)
One or more voices ready in your library, or a reference audio clip (3 to 10 seconds) for cloning

Step 1: Prepare Your Voices in the Lab

Before starting a project, set up the voices you plan to use. Open the Lab and choose your approach:

Clone tab - Clone a real voice from a recording, uploaded file, or YouTube link. Keep reference audio between 3 and 10 seconds for best results. See Voice Cloning for details.
Design tab - Describe the voice you want (tone, age, style) and the model generates it. See Voice Design.
TTS tab - Browse built-in model voices if you do not need a custom voice. See Text to Speech.

Click Save Voice after you are happy with each voice so it is available in your project later. If your content has multiple speakers (for example, a podcast host and guest), save a voice for each one.

Tip: Test each voice with a short sample of your actual script in the Lab before moving to Projects. This helps you catch issues early.

Step 2: Create a Project and Import Your Script

Open Projects and create a new project.
Import your content using one of these methods:
- Drag and drop an EPUB, PDF, DOCX, or TXT file
- Use the import button to browse for a file
- Paste text directly with Ctrl+V (Cmd+V on macOS)

Voice Creator Pro splits the text into segments at paragraph boundaries. Paragraphs longer than 500 characters are split at sentence boundaries automatically. You can adjust the Max chars per segment setting in Project Settings and click Re-parse if needed.

See Importing & Exporting for more on supported formats and text cleanup options.

Step 3: Assign Voices

With your script imported, assign voices to your content:

Set a default voice. Pick the primary voice for the entire project. All segments across all sections will use this voice unless you override them. For a single-narrator article, this may be the only voice you need.
Override per segment. Click any individual segment and pick a different voice from the dropdown to change just that segment. Useful for a guest quote in an otherwise single-voice article.
Use inline voice spans. Highlight text within a segment and assign a different voice to just that selection. This is ideal for podcast scripts where host and guest lines appear in the same paragraph.

You can use any combination of built-in, cloned and designed voices. See Voice Assignment for the full details.

Example: Podcast With Two Speakers

For a podcast script with a host and a guest:

Set the default voice to your host voice.
Highlight each guest line and assign your guest voice for that selection.
Alternatively, if your script is structured with one speaker per paragraph, override the voice at the segment level for guest paragraphs.

Step 4: Set Up the Lexicon

If your content includes names, technical terms, acronyms, or brand names that the model might mispronounce, add them to the Lexicon before generating.

Open the lexicon from Project Settings.
Add each word and its correct pronunciation.
Test a single segment containing that word to verify it sounds right.

Defining pronunciations up front saves you from regenerating segments later. You can also add entries after generating if you hear a mispronounced word, then regenerate all segments that contain it.

Step 5: Check Project Settings

Before generating audio, review your project settings. Open Project Settings and check:

Language - Set the output language, or leave on Auto to detect from the text.
Takes per generation - Set above 1 if you want multiple versions of each segment to choose from.
Paragraph gap - Silence between paragraphs (default: 300 ms). Podcasts often work better with shorter gaps, while e-learning may benefit from longer pauses.
Segment gap - Silence between segments (default: 1000 ms).
Format - Set your preferred export format (MP3, WAV, FLAC, or M4B).

See Project Settings for the full list of options.

Step 6: Fine-Tune Segments

Click any segment to access its controls:

Edit text inline to fix typos, adjust phrasing, or rewrite for better spoken delivery.
Split or merge segments if the automatic segmentation breaks in an awkward place. See Segments.
Adjust generation parameters per segment (speed, guidance scale, etc.) to control pacing and emphasis for specific lines.
Add expression tags like [laughter] or [surprise-oh] for natural vocal reactions. Expression tags are supported by OmniVoice and Chatterbox models.

Use Takes to Find the Best Delivery

Set Takes per generation above 1 in Project Settings to generate multiple versions of each segment. Audition each take and pick the one that sounds best. This is especially helpful for segments where tone and delivery matter, like an intro hook or a key quote.

Step 7: Generate Audio

Generate your project. You can generate the entire project at once or work through it section by section. Preview segments as you go and regenerate any that need improvement.

Tip: Generate a handful of segments first to confirm your voice assignments and settings sound right before generating everything.

Step 8: Export

When you are satisfied with the result, export your finished audio:

Format	Best For
MP3	Podcasts, social media, general distribution
WAV	Video editing, professional post-production workflows
FLAC	Lossless archival
M4B	Audiobooks with chapter markers

Before exporting, check the Output settings in Project Settings:

Paragraph gap controls silence between paragraphs (default: 300 ms)
Segment gap controls silence between segments (default: 1000 ms)

Adjust these to match the pacing you want. Podcasts often sound better with shorter gaps, while e-learning narration may benefit from longer pauses. See Importing & Exporting for more on export options.

Projects overview for a high-level look at the Projects workflow
Voice Cloning for detailed guidance on creating voice clones
Voice Design for creating voices from text descriptions
Segments for granular control over individual segments
Project Settings for all available project configuration options

What You Will Need

Step 1: Prepare Your Voices in the Lab

Step 2: Create a Project and Import Your Script

Step 3: Assign Voices

Example: Podcast With Two Speakers

Step 4: Set Up the Lexicon

Step 5: Check Project Settings

Step 6: Fine-Tune Segments

Use Takes to Find the Best Delivery

Step 7: Generate Audio

Step 8: Export

Use Case Tips

Article and Blog Narration

E-Learning Content

Video Narration Scripts

Next Steps

On this page