Question 1

What is Pocket TTS?

Accepted Answer

Pocket TTS is a lightweight text-to-speech model created by Kyutai Labs. It uses a Continuous Audio Language Model (CALM) architecture with roughly 100 million parameters, making it small enough to run on CPU faster than real-time. It supports both built-in preset voices and voice cloning from short audio samples.

Question 2

Who made Pocket TTS?

Accepted Answer

Pocket TTS was developed by Kyutai Labs, a research organization focused on real-time AI communication. They released it under the MIT license, one of the most permissive open-source licenses available, so anyone can use, inspect, and build on it.

Question 3

Does Pocket TTS support voice cloning?

Accepted Answer

Yes. Pocket TTS can clone a voice from a short audio clip. Simply upload a reference audio sample and the model will generate speech in that voice. It also offers built-in preset voices if you prefer to skip cloning and generate speech right away.

Question 4

What is the CALM architecture?

Accepted Answer

CALM stands for Continuous Audio Language Model. It is the architecture used by Pocket TTS to generate speech. Unlike discrete-token approaches, CALM works with continuous audio representations, allowing the model to stay extremely compact while still producing natural-sounding speech.

Question 5

Is Pocket TTS free?

Accepted Answer

Yes. Pocket TTS is open-source software released under the MIT license. On this site, it runs directly in your browser with no account, no signup, and no usage limits. Everything stays on your device.

Question 6

What are LSD steps?

Accepted Answer

LSD steps control the quality of the generated audio. A higher number of steps produces better quality output but takes longer to generate. A lower number is faster but may sacrifice some audio fidelity. You can adjust this with a slider from 1 to 10 to find the right balance for your needs.

Question 7

Do I need a powerful computer?

Accepted Answer

No. Pocket TTS was specifically designed to run on CPU faster than real-time. The INT8-quantized ONNX model is only about 148 MB, so it loads quickly and runs efficiently on virtually any modern device, including laptops, tablets, and phones.

Question 8

What browsers are supported?

Accepted Answer

Chrome, Edge, and other Chromium-based browsers work best. Firefox and Safari have limited support for some of the web features used for acceleration.

Try Pocket TTS Online for Free

Why Use Pocket TTS

CPU-Optimized

Voice Cloning

Built-in Voices

Lightweight

How It Works

Open the free tool

Type your text

Generate and download

Who Is Pocket TTS For?

Developers

Content Creators

Privacy-Conscious Users

Hobbyists and Makers

Tips for Best Results

Use clean reference audio for cloning

Adjust LSD steps for your needs

Built-in voices are great for quick TTS

Need more power? Try Voice Creator Pro.

GPU-Accelerated Processing

Voice Cloning in 600+ Languages

Voice Design from Text

Local REST API

Commercial License

Multiple Models in One App

Common Questions