Generate an Audiobook

A step-by-step walkthrough for turning a book file into a finished M4B audiobook with chapter markers, multiple narrator voices, and consistent pronunciations.

This tutorial walks you through converting a book into a complete audiobook using Voice Creator Pro's Projects feature. By the end you will have an M4B file with chapter markers, cover art, and polished narration ready for Apple Books, Audiobookshelf, or any audiobook player.

What You Will Need

A book file in EPUB, PDF, DOCX, or TXT format. EPUB works best because Voice Creator Pro automatically preserves its chapter structure.
Voices ready in your library. Clone or design the voices you want to use before starting the project. Visit Voice Cloning or Voice Design to set these up, or browse Voice Search for community voices.

Step 1: Import Your Book

Open Projects from the left sidebar and click New Project.
Import your book file (EPUB, PDF, DOCX, or TXT). You can also paste text directly with Ctrl+V (Cmd+V on macOS) anywhere in the Projects window to start a new project.
If your text contains decorative symbols or formatting artifacts, enable Clean up imported text in Project Settings to strip them automatically.

Voice Creator Pro splits the imported text into segments at paragraph boundaries. Paragraphs longer than the Max chars per segment setting (default 500) are split at sentence boundaries.

For full details on supported formats and cleanup options, see Importing & Exporting.

Step 2: Review Chapters and Segments

If you imported an EPUB, your chapters should already be detected and labeled. For other formats, check that the section structure looks right.

Rename sections by clicking a chapter title and editing it in place.
Split any segment that was cut at an awkward point, or merge adjacent segments that belong together.
Edit text inline to fix OCR errors, typos, or formatting issues from the import.

See Segments for the full set of segment controls.

Step 3: Set Up the Lexicon

Before generating any audio, add pronunciations for words the model might get wrong. Open the Lexicon from your project and add entries for:

Character names
Place names and foreign words
Made-up terms (fantasy, sci-fi)
Acronyms that should be spoken as words (or spelled out)

Define each entry once and it applies to every segment in the project. Adding entries before generation means you will not have to regenerate segments later.

You can also add lexicon entries after generating audio. If you hear a mispronounced word during review, add a lexicon entry for it and regenerate all segments that contain that word. The corrected pronunciation will apply automatically.

See Lexicon for details.

Step 4: Assign Voices

Voice assignment works at three levels:

Default voice - Set a default voice for the entire project. This voice is used for all segments across all sections unless you override it.
Per-segment override - Select any individual segment and assign a different voice to just that segment. Useful for dialogue or sections with a different narrator.
Inline voice spans - Highlight a portion of text within a segment and assign a different voice to just that selection. The rest of the segment keeps its current voice. This is ideal for quoted dialogue embedded within narration.

For a single-narrator audiobook, the default voice may be all you need. For multi-voice books, set the narrator as the default and override individual segments or highlighted spans for character dialogue.

See Voice Assignment for the full workflow.

Step 5: Check Project Settings

Before generating audio, review your project settings. Open Project Settings and check:

Language - Set the output language, or leave on Auto to detect from the text.
Takes per generation - Set above 1 if you want multiple versions of each segment to choose from.
Paragraph gap - Silence between paragraphs (default: 300 ms). Fiction often benefits from longer gaps (400-500 ms).
Segment gap - Silence between segments (default: 1000 ms).
Format - Set your preferred export format. For audiobooks, choose M4B.

See Project Settings for the full list of options.

Step 6: Test a Few Segments

Do not generate the entire book yet. Pick two or three segments from different chapters and generate them individually. Listen for:

Pronunciation issues you missed. Add any corrections to the Lexicon.
Voice fit. Make sure each voice sounds natural for its role. Swap voices now if something is off.
Pacing. Adjust the paragraph gap and segment gap in Project Settings if pauses feel too short or too long. The defaults are 300 ms (paragraph) and 1000 ms (segment).
Generation parameters. If a voice sounds too fast, too slow, or lacks the right tone, tweak speed and other parameters at the segment or project level.

This test pass saves significant time compared to generating everything and finding problems afterward.

Step 7: Generate All Audio

Once your test segments sound good, generate audio for the full project. Voice Creator Pro processes every segment using the voices and settings you configured.

Tips for Bulk Generation

Takes per generation. If you want multiple options for each segment, increase the Takes per generation setting in Project Settings before starting. You can then audition each take and pick the best one.
Regenerate selectively. After bulk generation, listen through and regenerate only the segments that need improvement. You do not have to redo the entire project.

Step 8: Export as M4B

When you are satisfied with the narration:

Click Export.
Choose M4B as the format.
Add cover art and metadata (title, author) if desired.
Export the file.

The M4B file includes chapter markers derived from your project's section structure, so listeners can skip between chapters in their audiobook player.

For other use cases, you can also export to MP3, WAV, or FLAC. See Importing & Exporting for format details.

Tips for Best Results

Prepare voices first. Clone and test voices in the Lab before starting a project. This keeps your workflow smooth and avoids switching contexts mid-project.
EPUB gives the best structure. If you have a choice of source format, use EPUB. It preserves chapter boundaries automatically, saving you manual section setup.
Use the Lexicon early. Adding pronunciation entries before generation is far more efficient than fixing them after hundreds of segments are already generated.
Listen in passes. After bulk generation, do a full listen-through. Flag segments that need tweaking and regenerate only those.
Adjust gaps for genre. Fiction often benefits from longer pauses between paragraphs (400-500 ms) for dramatic effect. Non-fiction can use shorter gaps for a brisker pace.

Reference Docs

Projects Overview - Full feature overview
Segments - Segment editing, splitting, merging, and per-segment controls
Voice Assignment - Multi-voice setup and inline voice spans
Lexicon - Custom pronunciation entries
Importing & Exporting - Supported formats and export settings
Project Settings - Language, generation parameters, and output configuration

On this page