Introducing Song Creator Pro — create music with AI, locally on your device. Try it now →

Try Supertonic TTS Online for Free

Multilingual text-to-speech across 31 languages with 10 preset voices. Runs entirely in your browser. No signup, no install, completely free.

Try Supertonic Free
31 Languages
WebGPU Accelerated
100% Private

Why Supertonic

Why Use Supertonic for Text-to-Speech

Supertonic 3 is an MIT-licensed multilingual text-to-speech system from Supertone. It uses a four-stage ONNX pipeline (text encoder, duration predictor, vector estimator, vocoder) and runs entirely on-device through ONNX Runtime Web.

At about 99 million parameters, it is a fraction of the size of most open TTS systems while supporting 31 languages and producing natural, expressive speech.

31 Languages

Cover every major European language plus Korean, Japanese, Arabic, Hindi, Vietnamese, Indonesian, and more from a single model.

10 Preset Voices

Five male and five female voice styles (M1 through M5 and F1 through F5) that work across every supported language.

WebGPU + WASM

Uses WebGPU for fast on-device inference where available, with an automatic WebAssembly fallback for Firefox and Safari.

Quality Tuning

Advanced settings let you trade off speed and quality by adjusting the number of denoising steps from 1 to 32.

Get Started

How It Works

1

Open the free tool

Go to the Supertonic TTS tool in your browser. The first run downloads the model and caches it locally for next time.

2

Pick a voice and language

Choose one of the 10 preset voices and the target language. Any voice can speak any of the 31 supported languages.

3

Type your text and generate

Enter your text, tweak the speed or quality steps in advanced settings if you want, then generate. Audio is ready in seconds.

Use Cases

Who Is Supertonic For?

Multilingual Creators

Generate voiceovers in 31 languages from a single model. Useful for international podcasts, dubbed video content, and localized social media.

Language Learners

Generate pronunciation examples and listening practice across European, Asian, and Middle Eastern languages without juggling multiple TTS providers.

Developers

Prototype voice features that need to work across many languages. MIT license means you can ship Supertonic in your own products, including commercial ones.

Accessibility

Convert articles, documentation, or learning material into speech for audiences across dozens of languages from one consistent voice catalog.

Getting the Most Out of Supertonic

Tips for Best Results

Match the language to your text

Supertonic uses the selected language to guide pronunciation. Switching to the right language gives much better results than leaving it on English for non-English text.

Tune quality with denoising steps

Lower step counts (4-6) are faster and good for previews. Higher counts (12-32) give the cleanest audio. The default of 8 is a sensible balance.

Use WebGPU where available

Chrome and Edge use your GPU for inference, which is much faster than WebAssembly. If you're on Firefox or Safari and generation feels slow, try a Chromium browser.

Voice Creator Pro

Need more? Go further with Voice Creator Pro.

Voice Creator Pro gives you higher quality models, voice cloning in 600+ languages, emotional speech, and a commercial use license. Try it free in your browser or download the desktop app.

Voice Cloning

Clone any voice from a short audio sample. Zero-shot cloning with no training required

600+ Languages

Combine multiple open-source models for TTS across 600+ languages

GPU-Accelerated Processing

Faster generation with NVIDIA, Apple Silicon, AMD, and Intel GPU support

Voice Design from Text

Describe a voice in plain text and the AI creates it. No audio samples needed

Local REST API

Automate voice generation in your own apps and workflows

Commercial License

Full rights to use generated audio in commercial projects

FAQ

Common Questions

Supertonic is an on-device multilingual text-to-speech system from Supertone. Supertonic 3 has 99 million parameters and supports 31 languages with 10 preset voices. Despite being a fraction of the size of larger open TTS systems, it produces natural, high-quality speech and runs entirely on-device with no cloud dependency.

Supertonic is developed by Supertone, a voice AI company. The open-weight ONNX checkpoint is released on Hugging Face under the MIT license, so you can use it freely in personal or commercial projects.

Supertonic 3 supports 31 languages: English, Korean, Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hindi, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese, plus a language-agnostic mode.

Supertonic ships with 10 preset voice styles: five male (M1 through M5) and five female (F1 through F5). Each voice can speak any of the supported languages.

Yes. Supertonic 3 is released under the MIT license, one of the most permissive open-source licenses. On this site it runs directly in your browser with no account, no signup, and no usage caps.

At about 99 million parameters, Supertonic 3 is a fraction of the size of 0.7B to 2B parameter open TTS systems while staying competitive on quality benchmarks. The smaller model size means faster cold starts, smaller downloads, and lower memory usage, which is what makes browser inference practical. For voice cloning, voice design, and 600+ languages, Voice Creator Pro is the desktop counterpart.

Supertonic 3 weighs about 400 MB on first download and is cached locally afterwards. It runs best on browsers with WebGPU support (Chrome, Edge) where it can use your GPU for inference. WebAssembly is used automatically as a fallback. Any recent laptop or desktop can run it, but the first download takes longer than smaller models.

Chrome and Edge work best because they support WebGPU acceleration. Firefox and Safari work too but fall back to WebAssembly, which is slower. The first run downloads the model and caches it; subsequent runs are much faster.