Introducing Song Creator Pro — create music with AI, locally on your device. Coming soon →
TutorialMarch 25, 2026·4 min read

How to Add Pauses and Control Pacing in AI-Generated Speech

Summarize this article with AISummarize

One of the most common problems with AI-generated speech is that it sounds rushed. Sentences blur together, there's no breathing room, and the result feels robotic. Its generally not because the voice itself is bad, but because the pacing is wrong.

Good pacing is what separates a flat reading from a compelling one. A well-placed pause can add weight to a phrase, give the listener time to absorb an idea, or build anticipation before a key moment.

The good news is that you can control pacing directly through how you format your input text. The simplest way is to use em-dashes (––) to add pauses exactly where you want them.

Note: The techniques in this guide have been tested with Qwen 3 TTS, which is one of the models that powers Voice Creator Pro. Results may vary with other TTS models.

Hear the Difference

Before diving into the technique, listen to these two clips. Both use the same voice and the same text — the only difference is the formatting of the input.

Without em-dashes — the text was entered as a plain paragraph with no pacing cues:

There is a land that never forgets. A land where time does not flow, but is stratified: stone upon stone. For millennia, the river has flowed like the lifeblood of a civilization that has defied eternity. Welcome to Egypt.

With em-dashes — the same text, but with –– pauses added:

There is a land –– that never forgets. A land –– where time does not flow, but is stratified: stone –– upon stone. For millennia, the river has flowed –– like the lifeblood of a civilization that has defied eternity. Welcome –– to Egypt.

Notice how the second version feels more intentional. The pauses give the listener time to absorb each phrase and create a sense of drama. That difference comes entirely from how the input text is formatted.

The Em-Dash Technique

To make the voice pause or slow down at a specific point, insert two em-dashes (––) where you want the break. The model treats this as a signal to briefly hold before continuing.

Here's a simple example:

The door opened –– and everything changed.

Compare that to the version without the pause:

The door opened and everything changed.

The first version gives "and everything changed" more impact. The pause creates a beat of suspense — the listener waits, then receives the payoff.

Tip: The pause created by –– is subtle — roughly the length of a natural breath. It won't create a full stop or an awkward silence.

Putting It All Together

Here's a full example with multiple em-dashes throughout. This is the exact input used to generate the audio sample above with em-dashes:

There is a land –– that never forgets.
A land –– where time does not flow, but is stratified: stone –– upon stone.
For millennia, the river has flowed –– like the lifeblood of a civilization that has defied eternity.
Welcome –– to Egypt.

Notice the pattern:

  1. Em-dashes appear before key phrases — "that never forgets", "where time does not flow", "upon stone", "like the lifeblood", "to Egypt". This gives each phrase a moment of anticipation.
  2. The final sentence is short and punchy — after the long, flowing sentences, "Welcome –– to Egypt" lands with weight.

Quick Tips for Better Pacing

  1. Read your text aloud first. Wherever you naturally pause or slow down, add two em-dashes.

  2. Don't overdo it. A pause on every other word loses its effect. Use pauses where they matter — before reveals, after important statements, or at emotional turns.

  3. Short sentences after long ones create punch. A flowing, descriptive sentence followed by a short declarative one is one of the most effective patterns for narration.

  4. Match pacing to the mood. A fast-paced action scene needs fewer pauses. A contemplative monologue benefits from more of them.

  5. Experiment and iterate. Generate, listen, adjust. Move an em-dash a few words earlier or later and hear how it changes the delivery.

Frequently Asked Questions

Yes, within Qwen 3 TTS (the model used by Voice Creator Pro), the em-dash pacing technique works with cloned voices, designed voices, and the built-in voices. The effect is consistent across all voice types. Other TTS models may respond differently to em-dashes.

One or two per sentence is usually enough. If you find yourself adding more than three, consider whether some of those pauses are really needed. Too many can make the delivery feel stilted.

Yes, this technique has been tested in various languages with Qwen 3 TTS. Since the model supports multiple languages, em-dashes are likely to affect pacing in other languages as well, but pacing may vary in accents that are naturally spoken at a faster pace.

Absolutely. You can design a voice with a "slow, deliberate pace" description *and* use em-dashes in your text for fine-grained control. The voice design sets the baseline pacing, and em-dashes let you adjust specific moments within that baseline.

A comma produces a very slight, natural pause — similar to spoken punctuation. An em-dash (`––`) creates a more noticeable, intentional pause that draws attention to what follows. Use commas for grammar, em-dashes for emphasis.