Try Pocket TTS Online for Free
Lightweight voice cloning and text-to-speech that runs on almost any device. No GPU needed, no signup, completely private.
Try Pocket TTS FreeWhy Pocket TTS
Why Use Pocket TTS
Pocket TTS is an open-source, MIT-licensed text-to-speech model by Kyutai Labs. With roughly 100 million parameters and a Continuous Audio Language Model (CALM) architecture, it was designed from the ground up to run on CPU faster than real-time.
It supports both built-in preset voices and voice cloning from short audio clips, all in a package that weighs just 148 MB as an INT8-quantized ONNX model.
CPU-Optimized
Designed to run faster than real-time on CPU. No GPU required, making it accessible on virtually any device.
Voice Cloning
Clone a voice from a short audio clip. Upload a reference sample and generate speech in that voice instantly.
Built-in Voices
Preset voices available for quick text-to-speech without needing to provide a reference audio sample.
Lightweight
Only 148 MB as an INT8-quantized ONNX model. Loads fast and runs efficiently in your browser.
Get Started
How It Works
For text-to-speech: Pick a built-in voice, type your text, and generate. For voice cloning, upload a short audio clip instead.
Open the free tool
Go to the Pocket TTS tool in your browser. Pick a built-in voice for quick TTS, or upload a short audio clip to clone a voice.
Type your text
Enter the text you want spoken. Pocket TTS supports English text input.
Generate and download
Hit generate and your audio is ready in seconds. Adjust LSD steps if you want to fine-tune the quality-speed tradeoff.
Use Cases
Who Is Pocket TTS For?
Developers
Build voice features for edge and embedded applications. Pocket TTS is small enough to deploy on resource-constrained devices and fast enough to run in real-time on CPU.
Content Creators
Generate voiceovers for videos, podcasts, or social media content. Use built-in voices for quick narration or clone your own voice from a short sample.
Privacy-Conscious Users
Everything runs locally in your browser. No audio is uploaded to any server, no account is required, and no data leaves your device.
Hobbyists and Makers
Experiment with text-to-speech and voice cloning in a lightweight, MIT-licensed model. Perfect for personal projects, learning, and prototyping.
Getting the Most Out of Pocket TTS
Tips for Best Results
Use clean reference audio for cloning
Background noise, music, or room echo will affect the cloned voice quality. Use a clip with clear, isolated speech and minimal interference.
Adjust LSD steps for your needs
Higher LSD steps produce better quality audio but take longer to generate. Lower steps are faster but may reduce fidelity. Start around 5 and adjust from there.
Built-in voices are great for quick TTS
If you just need fast text-to-speech without voice cloning, the built-in preset voices deliver consistent results with no setup required.
Voice Creator Pro
Need more power? Try Voice Creator Pro.
Voice Creator Pro offers GPU acceleration, voice cloning in 600+ languages, voice design from text descriptions, and a local REST API. One-time purchase of $49.99 with a full commercial license included.
GPU-Accelerated Processing
Faster generation with NVIDIA, Apple Silicon, AMD, and Intel GPU support
Voice Cloning in 600+ Languages
Combine multiple open-source models for TTS and voice cloning across 600+ languages
Voice Design from Text
Describe a voice in plain text and the AI creates it. No audio samples needed
Local REST API
Automate voice generation in your own apps and workflows
Commercial License
Full rights to use generated audio in commercial projects. One-time $49.99 purchase
Multiple Models in One App
Access Pocket TTS, Chatterbox, Kokoro, and more from a single desktop application
FAQ
Common Questions
Pocket TTS is a lightweight text-to-speech model created by Kyutai Labs. It uses a Continuous Audio Language Model (CALM) architecture with roughly 100 million parameters, making it small enough to run on CPU faster than real-time. It supports both built-in preset voices and voice cloning from short audio samples.
Pocket TTS was developed by Kyutai Labs, a research organization focused on real-time AI communication. They released it under the MIT license, one of the most permissive open-source licenses available, so anyone can use, inspect, and build on it.
Yes. Pocket TTS can clone a voice from a short audio clip. Simply upload a reference audio sample and the model will generate speech in that voice. It also offers built-in preset voices if you prefer to skip cloning and generate speech right away.
CALM stands for Continuous Audio Language Model. It is the architecture used by Pocket TTS to generate speech. Unlike discrete-token approaches, CALM works with continuous audio representations, allowing the model to stay extremely compact while still producing natural-sounding speech.
Yes. Pocket TTS is open-source software released under the MIT license. On this site, it runs directly in your browser with no account, no signup, and no usage limits. Everything stays on your device.
LSD steps control the quality of the generated audio. A higher number of steps produces better quality output but takes longer to generate. A lower number is faster but may sacrifice some audio fidelity. You can adjust this with a slider from 1 to 10 to find the right balance for your needs.
No. Pocket TTS was specifically designed to run on CPU faster than real-time. The INT8-quantized ONNX model is only about 148 MB, so it loads quickly and runs efficiently on virtually any modern device, including laptops, tablets, and phones.
Chrome, Edge, and other Chromium-based browsers work best. Firefox and Safari have limited support for some of the web features used for acceleration.