What is voice design in OmniVoice?

Voice design is creating a brand new voice from a set of attributes instead of cloning an existing one from a recording. With OmniVoice you choose gender, age, pitch, style, and an English accent, and the model generates a matching voice from scratch. No reference audio is required.

How is OmniVoice voice design different from Qwen3-TTS?

OmniVoice uses structured attributes: you pick from fixed options like 'female', 'middle-aged', 'british accent'. Qwen3-TTS takes a free-form text description of the voice you want. OmniVoice is more predictable and is the better choice when you specifically need accents, since it offers ten English accents as direct options.

What accents does OmniVoice support?

OmniVoice voice design offers ten English accents: American, British, Australian, Chinese, Canadian, Indian, Korean, Portuguese, Russian, and Japanese, plus an Auto option that lets the model decide. This is the standout feature of OmniVoice voice design.

What tags does OmniVoice support?

OmniVoice supports thirteen paralinguistic tags that you type inline in your text: `[laughter]`, `[sigh]`, `[confirmation-en]`, `[question-en]`, `[question-ah]`, `[question-oh]`, `[question-ei]`, `[question-yi]`, `[surprise-ah]`, `[surprise-oh]`, `[surprise-wa]`, `[surprise-yo]`, and `[dissatisfaction-hnn]`. The suffix after the hyphen is the vowel sound the model voices, not a language code. In Voice Creator Pro they are available from the smiley icon in the text input.

Does OmniVoice have emotion tags or emotion control?

No. OmniVoice has no emotion control, no emotion slider, and no emotion tags. Its tags produce non-verbal sounds like laughter and sighs, and the Style attribute offers only Auto and Whisper. Emotion in OmniVoice comes from the reference audio you clone: the feeling in the source clip carries into the generated speech. If you want to select an emotion directly, use Qwen3-TTS, which has thirteen emotions at five intensity levels.

How do I make an OmniVoice voice sound emotional?

Clone a reference clip that already has the emotion you want, because OmniVoice carries the feeling of the source audio into its output. A shaky, upset reference produces shaky, upset speech. For explicit control instead, switch to Qwen3-TTS in Voice Creator Pro and pick the emotion and intensity directly, as covered in the voice prompting guide.

Do I need reference audio to design a voice?

No. Voice design generates a voice from attributes alone, so you do not need any recording. If you want to copy a specific real voice instead, that is voice cloning, which uses a short 3 to 10 second reference clip.

Can I use OmniVoice voice design in Voice Creator Pro?

Yes. Voice Creator Pro includes OmniVoice voice design with all of its attributes (gender, age, pitch, style, and accent) in a simple interface on Windows and Mac, with VCP Cloud available in the browser.

OmniVoice Voice Design Guide: Accents, Attributes, and Tags

Voice cloning copies a voice that already exists. Voice design creates one that does not. Instead of feeding the model a recording, you describe the voice you want and it builds a match from scratch. That is useful whenever you need a specific character voice but have no reference audio to clone from: narrators, game characters, brand voices, and localized content.

OmniVoice takes a structured approach to voice design. Rather than a free-form text box, it gives you a fixed set of attributes to dial in. This guide covers every attribute, how to combine them, the feature that makes OmniVoice stand out (accents), and the full list of its paralinguistic tags, along with a straight answer on where emotion in OmniVoice actually comes from.

Hear the Accents

Accents are what OmniVoice does best, so they are the quickest way to hear what voice design gives you. Each voice below was designed from scratch, with no reference audio, by setting OmniVoice's attributes. Every one speaks English; the accent is the attribute doing the work. Press play to compare them.

Australian English spoken with an Australian accent, set straight from the accent menu.

American A clean, standard American-accented read.

Chinese English delivered with a Chinese accent, one of several non-native English accents OmniVoice offers.

British, Whispered A British accent paired with the Whisper style, soft and breathy, showing how two attributes combine.

Same language, four different speakers, and not one is a real or cloned voice. That is the accent attribute alone, and the last sample layers Whisper on top. The rest of this guide covers all five attributes and how to combine them.

Design your own voice free

Set OmniVoice's attributes in Voice Creator Pro and generate a voice in your browser. No credit card, commercial use included.

Try OmniVoice free →

OmniVoice vs Qwen3-TTS: Structured vs Free-Form

Both models can design a voice from scratch, but they take opposite approaches:

	OmniVoice	Qwen3-TTS
Input style	Structured attributes (pick from fixed options)	Free-form text description
Accents	Ten English accents as direct options	Described in free text, less reliable
Best when	You need accents or repeatable results	You want fine, descriptive nuance

The short version: if you want a specific accent or a result you can reproduce exactly, OmniVoice's structured attributes are the better tool. If you want to describe something nuanced in your own words ("a gravelly, world-weary detective"), Qwen3-TTS free-form design gives you more room. Both are available in Voice Creator Pro, so you can switch based on the job.

A simple way to choose: if you are designing voices for English or Chinese speech and you need a specific accent, OmniVoice excels. If you need something more dramatic and characterful, like a game character, a villain, or an over-the-top narrator, Qwen3-TTS gives you the descriptive range to get there. Our Qwen3 TTS prompting guide covers how to write those descriptions with examples.

The Attributes

OmniVoice voice design is built from five attributes. Every one has an Auto option, which hands that decision to the model. Set the attributes you care about and leave the rest on Auto.

Attribute	Options
Gender	Auto, Male, Female
Age	Auto, Child, Teenager, Young adult, Middle-aged, Elderly
Pitch	Auto, Very low, Low, Moderate, High, Very high
Style	Auto, Whisper
Accent	Auto, plus ten English accents (listed below)

A few of these reward a closer look.

Pitch is independent of gender and age. You can push a young adult voice lower or lift an elderly voice higher to fine-tune the character.

Style is currently focused. Use Auto for normal delivery, or Whisper for a soft, breathy read. Whisper is well suited to ASMR, intimate narration, suspense, and meditation content.

English Accents

This is the standout. OmniVoice offers ten English accents, plus Auto: American, British, Australian, Canadian, Indian, Chinese, Korean, Japanese, Portuguese, and Russian.

These generate English speech delivered with the chosen accent, so you can voice an Indian-accented narrator, a British storyteller, or a Russian-accented character without finding and cloning a real speaker. You can hear the Australian, American, Chinese, and British accents in the samples at the top of this guide.

Designing a Voice: Recipes

Attributes combine, so a handful of choices defines a distinct voice. Set the ones that matter and leave the rest on Auto. A few starting points:

Voice you want	Gender	Age	Pitch	Style	Accent
Warm British audiobook narrator	Female	Middle-aged	Moderate	Auto	British
Energetic American podcaster	Male	Young adult	Moderate	Auto	American
Gentle ASMR reader	Female	Young adult	Low	Whisper	Auto
Elderly storyteller	Male	Elderly	Low	Auto	Auto
Young Indian-accented explainer voice	Female	Young adult	Moderate	Auto	Indian
Suspense narrator	Male	Middle-aged	Very low	Whisper	American

Treat these as starting points. Change one attribute at a time and regenerate, so you can hear exactly what each control does to the voice.

OmniVoice Paralinguistic Tags: The Full List

The five attributes decide who the voice is. Paralinguistic tags decide what non-verbal sounds it makes while it talks: the laugh, the sigh, the questioning "hm?" before a sentence turns. You type them inline in your text, wrapped in square brackets, and OmniVoice performs the sound at that exact point.

In Voice Creator Pro they live behind the smiley icon in the text input, so you can insert one at the cursor instead of typing it from memory. Note that expression tags are supported by OmniVoice and Chatterbox only, so they will not do anything on the other models.

OmniVoice supports thirteen tags:

Tag	What it does
`[laughter]`	A laugh. The most useful of the set, and the most natural sounding.
`[sigh]`	An audible exhale. Fatigue, relief, or resignation, depending on the line around it.
`[confirmation-en]`	A short affirmative "mm" sound. Agreement or acknowledgement mid-conversation.
`[question-en]`	A questioning "hm?" Puzzlement or a request to repeat.
`[question-ah]`	A questioning "ah?" Slightly more open and surprised than `[question-en]`.
`[question-oh]`	A questioning "oh?" Reads as dawning realization or mild doubt.
`[question-ei]`	A questioning "ei?" A sharper, more startled query.
`[question-yi]`	A questioning "yi?" Curiosity at something unexpected.
`[surprise-ah]`	A surprised "ah!" The general-purpose surprise sound.
`[surprise-oh]`	A surprised "oh!" Closer to recognition than shock.
`[surprise-wa]`	A surprised "wa!" Bigger and more delighted, good for awe or excitement.
`[surprise-yo]`	A surprised "yo!" A short, punchy exclamation.
`[dissatisfaction-hnn]`	A displeased "hnn". A grumble, for annoyance or reluctance.

Read the suffix as the sound, not as a language. The part after the hyphen (en, ah, oh, ei, yi, wa, yo, hnn) is the vowel the model actually voices, which is why there are five question tags rather than one. They are not language codes, so [question-en] is not the English variant of anything. Pick by the sound you want to hear.

Drop them straight into the line, either before the words they colour or after them:

[laughter] You really got me there.
So I was walking down the street [laughter] and you won't believe what happened next [surprise-oh]
[question-en] Are you sure that's what she said?

A few things worth knowing before you scatter them through a script:

One per sentence, at most. Tags read as punctuation for the voice. Stacked together they stop sounding spontaneous and start sounding like a soundboard.
Position matters. A tag at the start colours the delivery that follows. A tag at the end reads as a reaction to what was just said.
Generate and listen. The five question tags are close cousins, and which one fits is much easier to hear than to predict. Swap one for another and regenerate.
They are sounds, not directions. A tag makes the voice produce a specific noise. It does not instruct the model to "sound sad" across a whole paragraph. For that, see the section below.

Where Emotion Actually Comes From in OmniVoice

This trips people up, so it is worth stating plainly: OmniVoice has no emotion control. There is no emotion menu, no emotion slider, and no emotion tag. The tags above produce non-verbal sounds, and the Style attribute offers only Auto and Whisper.

What OmniVoice does instead is carry the emotion of the reference audio you cloned from. The feeling in the source clip comes through in the output: clone a shaky, upset voice and it stays shaky; clone a bright, energetic read and the generated speech keeps that energy. So with OmniVoice you choose the emotion by choosing the reference clip, not by setting a control. For a designed voice with no reference audio, the delivery follows the attributes and the text itself.

If you want to select an emotion directly, that is a different model. Qwen3-TTS has emotion control, with thirteen emotions at five intensity levels each, and it takes free-form text descriptions of how a line should be delivered. Both models are in Voice Creator Pro, so pick per job: OmniVoice for accents, speed, and carrying a reference's feeling, Qwen3-TTS when you need to dial an emotion in explicitly. The voice prompting guide covers Qwen3 emotion control and how to write those descriptions, and how to add emotion to text to speech walks through applying emotions across a longer piece.

Tips for Better Voice Design

Start with Auto, then narrow. Generate once with most attributes on Auto to hear the model's default, then lock in the attributes you want to change. This is faster than setting all five blind.

Pitch is your fine-tuning dial. Once gender, age, and accent give you the right character, use pitch to nudge it warmer (lower) or brighter (higher) without changing the rest.

Lean on accents for character variety. If you need several distinct voices for a cast, accent is the fastest way to make them feel like different people, even with similar gender and age settings.

Change one thing at a time. Voice design is iterative. Adjusting a single attribute per generation makes it clear what each control contributes, so you can dial in the voice you hear in your head.

Voice design is most stable in English and Chinese. OmniVoice's voice design was trained on English and Chinese data, so designed voices are most reliable in those two languages. You can still synthesize other languages, but expect cloning to be the steadier option outside English and Chinese.

Voice Design vs Cloning vs Prompting

These three get conflated constantly, so to be clear:

Voice design (this guide): build a new voice from attributes. No recording needed.
Voice cloning: copy a specific real voice from a short reference clip (3 to 10 seconds). Use this when you want a particular person's voice.
Voice prompting: control the delivery, emotion, and performance of a voice through text instructions (for example, our DramaBox prompting guide). This shapes how a voice acts, not who it is.

Design Your Voice in Voice Creator Pro

Voice Creator Pro includes OmniVoice voice design with every attribute covered above (gender, age, pitch, style, and all ten accents) in a point-and-click interface, so there is no setup or prompt engineering required.

Designed voices are not just for one-off clips. In Voice Creator Pro, both OmniVoice designed voices and cloned voices work inside projects, so you can use them for full audiobooks, YouTube voiceovers, and other long-form narration. Projects also support multiple speakers, which means you can build a designed cast, give each character its own attributes, and run an entire dialogue or chapter in one place.

The desktop app runs OmniVoice locally and offline, with unlimited generations and no subscription, on Windows and Mac. Or try it for free in your browser, no GPU or install required.