AI Voice Design Guide: 3 Ways to Create a Voice From Scratch (2026)
Most AI voice tools assume you already have a voice to copy. But what if you need a voice that does not exist yet? A narrator with a specific accent, a game character, a brand voice that is not based on any real person? That is what voice design is for: you describe the voice you want, and the model builds it from scratch. No recording required.
Three of the best open-source models, all available in Voice Creator Pro, can do this, and each takes a completely different approach to the input. This guide breaks down all three so you can pick the right one for the voice in your head.
First: Voice Design Is Not Voice Cloning
These get mixed up constantly, so to be clear:
- Voice cloning copies a specific real voice from a short reference clip (3 to 10 seconds of audio). Use it when you want a particular person's voice.
- Voice design builds a new voice from a description, with no recording at all. Use it when you want a voice that does not exist yet.
This guide is about design. If you want to copy a real voice instead, see our voice cloning comparison.
The Three Approaches at a Glance
| OmniVoice | Qwen3-TTS | DramaBox | |
|---|---|---|---|
| How you design | Structured attributes (pick from fixed options) | Free-form text description | Speaker phrase inside a prompt pattern |
| Reference audio | Not used | Not used | Optional: clone a voice, then direct it |
| Control style | Predictable and repeatable | Descriptive and nuanced | Tied to performance and emotion |
| Accents | Ten English accents as direct options | Described in free text (less reliable) | General, via the speaker phrase |
| Best for | Accents and consistent results | Specific, nuanced characters | Expressive, acted character voices |
The rest of this guide covers each one in detail.
1. OmniVoice: Structured Attributes
OmniVoice gives you a fixed set of attributes to dial in, rather than a text box. You choose from preset options and the model assembles the voice. Every attribute also has an Auto setting that lets the model decide, so you only set what you care about.
- Gender: Auto, male, female
- Age: Auto, child, teenager, young adult, middle-aged, elderly
- Pitch: Auto, very low through very high
- Style: Auto, whisper
- English accent: Auto, plus ten options (American, British, Australian, Chinese, Canadian, Indian, Korean, Portuguese, Russian, Japanese)
The standout is accents. OmniVoice is the most reliable of the three for accent work. If you need an Indian-accented narrator or a British storyteller, this is the model to reach for.
Use OmniVoice when you want a specific accent, or prefer a more straight-forward path to voice design without writing prompts while trading off creativity.
For a full walkthrough of every attribute and a set of ready-made recipes, see the dedicated OmniVoice Voice Design Guide.
2. Qwen3-TTS: Free-Form Description
Qwen3-TTS takes the opposite approach. Instead of fixed options, you describe the voice you want in plain English and the model interprets it. That trades some predictability for a lot more nuance.
Descriptions can be as simple or as detailed as you like:
a middle-aged female professor with a slight British accent
a gravelly, world-weary man in his sixties with a slow, deliberate delivery
a cheerful young woman with a bright, high-energy voice
Because the input is free text, you can express character that does not fit a preset, like "world-weary" or "bright, high-energy." The trade-off is that the model has to interpret your words, so results vary more between generations, and accents are less reliable than OmniVoice's fixed options.
Use Qwen3-TTS when you have a specific, nuanced character in mind and want to describe it in your own words rather than pick from a list.
For the seven controllable dimensions, example prompts, and an iterative workflow, see the Qwen3-TTS Voice Design Prompting Guide.
3. DramaBox: The Prompt Pattern
DramaBox designs the voice right inside the prompt you use to generate speech. There is no separate description field. Instead, you write the voice and its performance together using a simple repeating pattern:
A <speaker> <verb>, "<dialogue>" <pronoun> <verb>, "<dialogue>"
For example:
A man speaks calmly, "I told you this would happen." He sighs heavily, "But nobody ever listens to me."
Two rules drive the whole thing:
- Quoted text is spoken literally. Everything inside the quotes comes out of the speaker's mouth, including sounds like "Hahaha" or "Mmmm-mmm."
- Unquoted text is stage direction. The speaker phrase and the verbs that follow shape who is talking and how they deliver each line, but the model does not read them aloud.
So the same prompt both defines the voice (the speaker phrase) and directs the performance (the verbs and emotion), which is why design and direction happen in one place. Chaining segments with a pronoun and a fresh verb lets the voice shift tone mid-thought, which is where DramaBox shines.
You can also hand DramaBox an existing voice as reference audio. It clones that voice and then performs your prompt in it, so you keep a specific person's identity while still directing the emotion and delivery through the stage directions. That makes it a bridge between design and cloning: start from a voice you already have, then act it.
Use DramaBox when the voice and the performance are inseparable, like expressive character work, dialogue, and emotional delivery, where you are designing and directing in one go.
For the full prompt structure, vocal effects, and emotion patterns, see the DramaBox Prompting Guide.
How to Choose
| If you need... | Use |
|---|---|
| A specific accent, or more repeatable results | OmniVoice |
| A nuanced character described in your own words | Qwen3-TTS |
| An expressive voice you will also direct and emote | DramaBox |
You are not locked into one. A common workflow is to design a clean base voice with OmniVoice or Qwen3-TTS, then move to DramaBox when you need expressive, acted delivery. Voice design defines who the voice is; prompting defines how it performs.
Design Voices in Voice Creator Pro
Voice Creator Pro includes OmniVoice, Qwen3-TTS, and DramaBox, so all three design approaches live in one interface with no setup or prompt engineering required to get started.
The desktop app runs every model locally and offline, with unlimited generations and no subscription, on Windows and Mac. Or try VCP Cloud in your browser on a generous free tier, with no GPU or install.
Either way you get the same models and the same quality. Pick an approach, describe the voice you want, and generate one that did not exist a moment ago.
Try Voice Design for free
Also available on Windows and macOS. One-time purchase, unlimited generations.