Introducing Song Creator Pro — create music with AI, locally on your device. Try it now →

Generate Long-Form Audio

Walk through producing podcasts, article narrations, video voiceovers, and e-learning content in Voice Creator Pro using Projects.

This tutorial walks you through producing long-form audio such as podcast episodes, article narrations, social media voiceovers, and e-learning content. You will prepare voices in the Lab, import a script into Projects, assign voices, fine-tune individual segments, and export a finished audio file.

Looking to create an audiobook with chapter markers instead? See the audiobook tutorial.

What You Will Need

  • Voice Creator Pro installed and running
  • A script or document for your content (TXT, DOCX, PDF, or text in your clipboard)
  • One or more voices ready in your library, or a reference audio clip (3 to 10 seconds) for cloning

Step 1: Prepare Your Voices in the Lab

Before starting a project, set up the voices you plan to use. Open the Lab and choose your approach:

  • Clone tab - Clone a real voice from a recording, uploaded file, or YouTube link. Keep reference audio between 3 and 10 seconds for best results. See Voice Cloning for details.
  • Design tab - Describe the voice you want (tone, age, style) and the model generates it. See Voice Design.
  • TTS tab - Browse built-in model voices if you do not need a custom voice. See Text to Speech.

Click Save Voice after you are happy with each voice so it is available in your project later. If your content has multiple speakers (for example, a podcast host and guest), save a voice for each one.

Tip: Test each voice with a short sample of your actual script in the Lab before moving to Projects. This helps you catch issues early.

Step 2: Create a Project and Import Your Script

  1. Open Projects and create a new project.
  2. Import your content using one of these methods:
    • Drag and drop an EPUB, PDF, DOCX, or TXT file
    • Use the import button to browse for a file
    • Paste text directly with Ctrl+V (Cmd+V on macOS)

Voice Creator Pro splits the text into segments at paragraph boundaries. Paragraphs longer than 500 characters are split at sentence boundaries automatically. You can adjust the Max chars per segment setting in Project Settings and click Re-parse if needed.

See Importing & Exporting for more on supported formats and text cleanup options.

Step 3: Assign Voices

With your script imported, assign voices to your content:

  1. Set a default voice. Pick the primary voice for the entire project. All segments across all sections will use this voice unless you override them. For a single-narrator article, this may be the only voice you need.
  2. Override per segment. Click any individual segment and pick a different voice from the dropdown to change just that segment. Useful for a guest quote in an otherwise single-voice article.
  3. Use inline voice spans. Highlight text within a segment and assign a different voice to just that selection. This is ideal for podcast scripts where host and guest lines appear in the same paragraph.

You can use any combination of built-in, cloned and designed voices. See Voice Assignment for the full details.

Example: Podcast With Two Speakers

For a podcast script with a host and a guest:

  1. Set the default voice to your host voice.
  2. Highlight each guest line and assign your guest voice for that selection.
  3. Alternatively, if your script is structured with one speaker per paragraph, override the voice at the segment level for guest paragraphs.

Step 4: Set Up the Lexicon

If your content includes names, technical terms, acronyms, or brand names that the model might mispronounce, add them to the Lexicon before generating.

  1. Open the lexicon from Project Settings.
  2. Add each word and its correct pronunciation.
  3. Test a single segment containing that word to verify it sounds right.

Defining pronunciations up front saves you from regenerating segments later. You can also add entries after generating if you hear a mispronounced word, then regenerate all segments that contain it.

Step 5: Check Project Settings

Before generating audio, review your project settings. Open Project Settings and check:

  • Language - Set the output language, or leave on Auto to detect from the text.
  • Takes per generation - Set above 1 if you want multiple versions of each segment to choose from.
  • Paragraph gap - Silence between paragraphs (default: 300 ms). Podcasts often work better with shorter gaps, while e-learning may benefit from longer pauses.
  • Segment gap - Silence between segments (default: 1000 ms).
  • Format - Set your preferred export format (MP3, WAV, FLAC, or M4B).

See Project Settings for the full list of options.

Step 6: Fine-Tune Segments

Click any segment to access its controls:

  • Edit text inline to fix typos, adjust phrasing, or rewrite for better spoken delivery.
  • Split or merge segments if the automatic segmentation breaks in an awkward place. See Segments.
  • Adjust generation parameters per segment (speed, guidance scale, etc.) to control pacing and emphasis for specific lines.
  • Add expression tags like [laughter] or [surprise-oh] for natural vocal reactions. Expression tags are supported by OmniVoice and Chatterbox models.

Use Takes to Find the Best Delivery

Set Takes per generation above 1 in Project Settings to generate multiple versions of each segment. Audition each take and pick the one that sounds best. This is especially helpful for segments where tone and delivery matter, like an intro hook or a key quote.

Step 7: Generate Audio

Generate your project. You can generate the entire project at once or work through it section by section. Preview segments as you go and regenerate any that need improvement.

Tip: Generate a handful of segments first to confirm your voice assignments and settings sound right before generating everything.

Step 8: Export

When you are satisfied with the result, export your finished audio:

FormatBest For
MP3Podcasts, social media, general distribution
WAVVideo editing, professional post-production workflows
FLACLossless archival
M4BAudiobooks with chapter markers

Before exporting, check the Output settings in Project Settings:

  • Paragraph gap controls silence between paragraphs (default: 300 ms)
  • Segment gap controls silence between segments (default: 1000 ms)

Adjust these to match the pacing you want. Podcasts often sound better with shorter gaps, while e-learning narration may benefit from longer pauses. See Importing & Exporting for more on export options.

Use Case Tips

Article and Blog Narration

A single narrator voice usually works best. Focus on pacing and paragraph gaps to keep the audio easy to follow. Use the lexicon for any technical terms or proper nouns.

Social Media Voiceovers

Keep segments short and punchy. Export as MP3 or WAV depending on your video editor. Tighten paragraph and segment gaps for a faster pace.

E-Learning Content

Use a clear, measured voice. Increase segment gaps slightly to give learners time to absorb information. Add lexicon entries for domain-specific terminology.

Video Narration Scripts

Export as WAV for the best compatibility with video editing software. Match the segment gap settings to your video's pacing needs.

Next Steps