Introducing Song Creator Pro — create music with AI, locally on your device. Try it now →
TutorialJune 15, 2026·8 min read

AI Voice Design Guide: 3 Ways to Create a Voice From Scratch (2026)

Summarize this article with AISummarize

Most AI voice tools assume you already have a voice to copy. But what if you need a voice that does not exist yet? A narrator with a specific accent, a game character, a brand voice that is not based on any real person? That is what voice design is for: you describe the voice you want, and the model builds it from scratch. No recording required.

Three of the best open-source models, all available in Voice Creator Pro, can do this, and each takes a completely different approach to the input. This guide breaks down all three so you can pick the right one for the voice in your head.

First: Voice Design Is Not Voice Cloning

These get mixed up constantly, so to be clear:

  • Voice cloning copies a specific real voice from a short reference clip (3 to 10 seconds of audio). Use it when you want a particular person's voice.
  • Voice design builds a new voice from a description, with no recording at all. Use it when you want a voice that does not exist yet.

This guide is about design. If you want to copy a real voice instead, see our voice cloning comparison.

The Three Approaches at a Glance

OmniVoice Qwen3-TTS DramaBox
How you design Structured attributes (pick from fixed options) Free-form text description Speaker phrase inside a prompt pattern
Reference audio Not used Not used Optional: clone a voice, then direct it
Control style Predictable and repeatable Descriptive and nuanced Tied to performance and emotion
Accents Ten English accents as direct options Described in free text (less reliable) General, via the speaker phrase
Best for Accents and consistent results Specific, nuanced characters Expressive, acted character voices

The rest of this guide covers each one in detail.

1. OmniVoice: Structured Attributes

OmniVoice gives you a fixed set of attributes to dial in, rather than a text box. You choose from preset options and the model assembles the voice. Every attribute also has an Auto setting that lets the model decide, so you only set what you care about.

  • Gender: Auto, male, female
  • Age: Auto, child, teenager, young adult, middle-aged, elderly
  • Pitch: Auto, very low through very high
  • Style: Auto, whisper
  • English accent: Auto, plus ten options (American, British, Australian, Chinese, Canadian, Indian, Korean, Portuguese, Russian, Japanese)

The standout is accents. OmniVoice is the most reliable of the three for accent work. If you need an Indian-accented narrator or a British storyteller, this is the model to reach for.

Use OmniVoice when you want a specific accent, or prefer a more straight-forward path to voice design without writing prompts while trading off creativity.

For a full walkthrough of every attribute and a set of ready-made recipes, see the dedicated OmniVoice Voice Design Guide.

2. Qwen3-TTS: Free-Form Description

Qwen3-TTS takes the opposite approach. Instead of fixed options, you describe the voice you want in plain English and the model interprets it. That trades some predictability for a lot more nuance.

Descriptions can be as simple or as detailed as you like:

a middle-aged female professor with a slight British accent
a gravelly, world-weary man in his sixties with a slow, deliberate delivery
a cheerful young woman with a bright, high-energy voice

Because the input is free text, you can express character that does not fit a preset, like "world-weary" or "bright, high-energy." The trade-off is that the model has to interpret your words, so results vary more between generations, and accents are less reliable than OmniVoice's fixed options.

Use Qwen3-TTS when you have a specific, nuanced character in mind and want to describe it in your own words rather than pick from a list.

For the seven controllable dimensions, example prompts, and an iterative workflow, see the Qwen3-TTS Voice Design Prompting Guide.

3. DramaBox: The Prompt Pattern

DramaBox designs the voice right inside the prompt you use to generate speech. There is no separate description field. Instead, you write the voice and its performance together using a simple repeating pattern:

A <speaker> <verb>, "<dialogue>" <pronoun> <verb>, "<dialogue>"

For example:

A man speaks calmly, "I told you this would happen." He sighs heavily, "But nobody ever listens to me."

Two rules drive the whole thing:

  • Quoted text is spoken literally. Everything inside the quotes comes out of the speaker's mouth, including sounds like "Hahaha" or "Mmmm-mmm."
  • Unquoted text is stage direction. The speaker phrase and the verbs that follow shape who is talking and how they deliver each line, but the model does not read them aloud.

So the same prompt both defines the voice (the speaker phrase) and directs the performance (the verbs and emotion), which is why design and direction happen in one place. Chaining segments with a pronoun and a fresh verb lets the voice shift tone mid-thought, which is where DramaBox shines.

You can also hand DramaBox an existing voice as reference audio. It clones that voice and then performs your prompt in it, so you keep a specific person's identity while still directing the emotion and delivery through the stage directions. That makes it a bridge between design and cloning: start from a voice you already have, then act it.

Use DramaBox when the voice and the performance are inseparable, like expressive character work, dialogue, and emotional delivery, where you are designing and directing in one go.

For the full prompt structure, vocal effects, and emotion patterns, see the DramaBox Prompting Guide.

How to Choose

If you need... Use
A specific accent, or more repeatable results OmniVoice
A nuanced character described in your own words Qwen3-TTS
An expressive voice you will also direct and emote DramaBox

You are not locked into one. A common workflow is to design a clean base voice with OmniVoice or Qwen3-TTS, then move to DramaBox when you need expressive, acted delivery. Voice design defines who the voice is; prompting defines how it performs.

Design Voices in Voice Creator Pro

Voice Creator Pro includes OmniVoice, Qwen3-TTS, and DramaBox, so all three design approaches live in one interface with no setup or prompt engineering required to get started.

The desktop app runs every model locally and offline, with unlimited generations and no subscription, on Windows and Mac. Or try VCP Cloud in your browser on a generous free tier, with no GPU or install.

Either way you get the same models and the same quality. Pick an approach, describe the voice you want, and generate one that did not exist a moment ago.

Try Voice Design for free

Also available on Windows and macOS. One-time purchase, unlimited generations.

Stay in the loop

Get Updates

Get notified about new features, platform launches, and updates. No spam, unsubscribe anytime.

No spam, ever. Unsubscribe anytime.

Frequently Asked Questions

Voice design is creating a brand new voice from a description instead of cloning an existing one from a recording. You specify what the voice should sound like (gender, age, accent, tone) and the model generates a matching voice from scratch. No reference audio is required.

Voice cloning copies a specific real voice from a short reference clip (3 to 10 seconds). Voice design builds a new voice from a description, with no recording at all. Use cloning when you want a particular person's voice, and design when you want a voice that does not exist yet.

OmniVoice. It offers ten English accents as direct, selectable options (American, British, Australian, Chinese, Canadian, Indian, Korean, Portuguese, Russian, and Japanese), which makes accents far more reliable than describing them in free text.

Qwen3-TTS. It takes a free-form text description, so you can write something like 'a gravelly, world-weary man in his sixties' and the model interprets it. OmniVoice uses fixed attribute options instead, and DramaBox uses a short speaker phrase inside its prompt.

Yes. Voice Creator Pro includes OmniVoice, Qwen3-TTS, and DramaBox, so all three voice design approaches are available in one app on Windows and Mac, with VCP Cloud in the browser.

Back to Blog