Introducing Song Creator Pro — create music with AI, locally on your device. Try it now →
TutorialJune 7, 2026·7 min read

DramaBox Prompting Guide: How to Write Prompts for Expressive AI Speech

Summarize this article with AISummarize

DramaBox is a text-to-speech model that lets you generate extremely expressive speech. Laughs, gasps, whispers, fury, tenderness - it can do all of it, but only if you prompt it correctly. A good prompt is the difference between flat robotic output and a performance that sounds like real voice acting. This guide covers every pattern you need to write prompts that work.

Hear It in Action

Here is a prompt that moves through boredom, sarcasm, excitement, confusion, frustration, and finally despair, all in one take:

A man speaks in a low voice, "I've been on hold for like, thirty minutes now."
He sigh, and says snarkily, "Its hilarious that they keep saying 'Your call is important to us.' Sure it is."
He speaks excitedly, "Oh snap! I'm next in queue!"
He talks calmly, "Hi, yes I'm calling about my phone bill from last month –– Hello? Hello? HELLO?"
He says evenly, sounding confused, "Wait, did the call just drop?"
His voice rises, "Are you kidding me? After forty-seven minutes?!"
He sighs with despair, "There goes my friday evening.."
Listen to the result

Notice how each segment hands off to the next with a pronoun and a fresh verb. That is the core pattern, and the rest of this guide breaks down exactly how to build prompts like this.

The Prompt Pattern

Every DramaBox prompt follows the same basic structure:

A <speaker> <verb>, "<dialogue>" <pronoun> <verb>, "<dialogue>"

For example:

A man speaks calmly, "I told you this would happen." He sighs heavily, "But nobody ever listens to me."

There are three rules to remember:

  1. Quoted text is spoken literally. Everything inside the quotes comes out of the speaker's mouth, including sounds like "Hahaha" or "Mmmm-mmm."
  2. Unquoted text is stage direction. It shapes how the line is delivered, but the model does not read it aloud.
  3. End at the last closing quote. Any description after the final quote mark gets ignored or read aloud by mistake.

Multi-Segment Prompts

Single-segment prompts work, but multi-segment prompts are worth reaching for because they mirror how people actually talk: we modulate, shift tone, and move between emotions within a single thought. Setting up a calm baseline and then shifting into the real emotion recreates that natural movement. The contrast makes the emotional delivery much more convincing.

A man speaks evenly, "I gave you one job."
His voice rises with fury, "AND YOU MESSED IT UP!"

This fires more reliably than trying to pack all the emotion into a single line. Use the speaker's gendered pronoun (He / She) when writing continuation segments.

Describing Your Speaker

The speaker should be a generic noun phrase with an optional one-word adjective. Keep it simple.

Works Does Not Work
A man A radio host
A woman A drill sergeant
A young woman A detective
An elderly man A spy
A weary man A teacher
A woman with with a warm, smoky voice A late-night radio host

Why roles don't work: A role like "detective" doesn't tell the model the actual identity of the voice you want, so it has nothing concrete to anchor on. Instead of giving you the delivery you imagined, it tends to fill in the gap and hallucinate something unexpected: the wrong age, the wrong tone, sometimes even reading the word "detective" aloud. Stick to generic nouns like man, woman, or child, and use a single adjective to convey character.

Adding Vocal Effects (Paratags)

DramaBox can produce vocal effects like laughs, sighs, and gasps. The key is that the phonetic content inside the quotes triggers the sound. Writing "laughs" in the stage direction alone is weak. You need to pair the direction with matching sounds in the dialogue.

Effect Pattern
Laugh A man bursts into uncontrollable laughter, "Hahaha! <line>"
Chuckle A man chuckles darkly, "<line>"
Giggle A woman giggles, "Hehehe, <line>" (reliable on female voices)
Sigh A woman sighs heavily, "<line>"
Gasp A man says, "<setup>" He gasps with shock, "<reaction>"
Hum A woman hums quietly, "Mmmm-mmm, <line>"
Throat clear A man clears his throat, "<line>"
Cough A man coughs once, "<line>"
Yawn A woman yawns deeply, "Ugh, <line>"
Wheeze A man wheezes with laughter, "<line>"
Exhale A woman blows out a long exhale, "<line>"
Inhale A man sucks in a startled inhale, "<line>"
Shaky breath A weary man takes a shaky breath, "<line>" (speaker adjective required)
Tsk A woman makes a tsk-tsk sound, "<line>" (doubled phonetic required)
Sniffle A man lets out a wet sniffle, "<line>" (the word "wet" is required)
Cheer A woman cheers loudly, "Woooo! <line>"

Notice the specific requirements on some effects. "Shaky breath" needs the speaker adjective (like "weary"). "Tsk" needs the doubled "tsk-tsk." "Sniffle" needs the word "wet." These patterns were validated by listening, so follow them exactly.

Controlling Emotion and Style

Each emotion works best as a two-segment prompt: a calm or neutral setup followed by the emotional payoff. The contrast is what makes the emotion land.

Tone Pattern
Angry / loud A man speaks evenly, "<setup>" His voice rises with fury, "<LINE IN CAPS>"
Tender A woman speaks tenderly, "<setup>" She hums quietly, "Mmmm-mmm, <follow-up>"
Menacing A man speaks with cold menace, "<setup>" He chuckles darkly, "<follow-up>"
Sad A grieving woman weeps softly, "<setup>" She sighs with despair, "<follow-up>"
Joyful A man bursts into uncontrollable laughter, "Hahaha! <line>"
Fearful A terrified woman speaks shakily, "<setup>" She begins to cry, "<follow-up>"
Nervous A young man clears his throat, "<setup>" He stammers nervously, "<follow-up>"
Awe A man speaks with quiet awe, "<setup>" He breathes out slowly, "<follow-up>"
Smug A man speaks with smug pride, "<setup>" He chuckles confidently, "<follow-up>"
Flirty A flirtatious woman purrs flirtatiously, "<setup>" She laughs softly, "<follow-up>"

A few things to note:

  • For angry delivery, writing the emotional line in all caps helps push the intensity.
  • Speaker adjectives like "grieving," "terrified," and "flirtatious" reinforce the emotion set in the stage direction.
  • Always use the correct gendered pronoun (He / She) in continuation segments.

Common Mistakes

What People Write Why It Fails Write This Instead
A tired, anxious, old man says, "Hello." Too many adjectives; model gets confused A weary man says, "Hello."
A man laughs, "That's hilarious." Stage direction alone is weak for effects A man bursts into uncontrollable laughter, "Hahaha! That's hilarious."
A woman says, "I can't believe it. This changes everything." She gasps in shock. Trailing description after the last quote A woman says, "I can't believe it." She gasps with shock, "This changes everything."
A man whispers, "Be quiet." (single segment) Single segment is less expressive A man speaks softly, "Listen to me." He whispers urgently, "Be quiet."
A detective speaks firmly, "Freeze." The model reads "detective" aloud A man speaks firmly, "Freeze."

The pattern is consistent: keep speakers generic, put phonetic content in quotes, end on a quote, and use multi-segment structure.

Tips for Getting the Best Results

Use multi-segment prompts by default. A calm setup followed by an emotional shift produces dramatically better results than a single expressive line. Even if you only need one emotion, a neutral lead-in gives the model room to build into it.

Match your dialogue to the emotion. An angry prompt with a polite sentence will sound off. If the delivery should be intense, write intense dialogue. If the stage direction says "fury," the quoted text should match that energy.

Put sounds inside the quotes. "Hahaha," "Mmmm-mmm," "Woooo," and "Hehehe" are not decoration. They are the primary trigger for vocal effects. The stage direction verb supports them, but the phonetic content inside quotes is what actually fires the effect.

Follow the specific requirements. Some paratags have non-obvious requirements (like "wet sniffle" or "tsk-tsk" with doubled phonetics). These are not suggestions. They were validated by testing, and skipping them produces weaker or inconsistent results.

Keep speaker descriptions minimal. One noun, one optional adjective. That's it. The more words you add to the speaker, the more likely the model is to read them instead of using them as direction.

Try DramaBox for free

Also available on Windows and macOS. One-time purchase, unlimited generations.

Stay in the loop

Get Updates

Get notified about new features, platform launches, and updates. No spam, unsubscribe anytime.

No spam, ever. Unsubscribe anytime.

Frequently Asked Questions

A multi-segment prompt has two or more speaker-verb-dialogue blocks in sequence. The first segment usually sets a calm or neutral tone, and the following segments shift to a different emotion or delivery. This contrast is what makes DramaBox produce its most expressive output.

DramaBox treats the speaker field as a simple identifier, not a character description. When you write "A detective," it may speak the word "detective" as part of the output. Stick to generic nouns like "man," "woman," or "child" and use a single adjective for characterization.

Yes. Voice Creator Pro includes DramaBox as one of its available TTS models. You can write your prompts using the patterns in this guide and generate speech directly in the app.

Use all caps for the dialogue in the emotional segment, and pair it with an intense verb phrase like "rises with fury" or "shouts." The two-segment structure (calm then intense) makes the contrast more pronounced than a single loud line.

Back to Blog