Pricing
Flexible pricing
Windows
- One-time payment, lifetime access
- Unlimited generations
- Runs fully offline
- Voice cloning
- 600+ languages
- Built-in voice library with thousands of community voices
- Voice design
- Speech to Text
- Long-form audio generation
- Basic and Advanced TTS Models
macOS
- One-time payment, lifetime access
- Unlimited generations
- Runs fully offline
- Voice cloning
- 600+ languages
- Built-in voice library with thousands of community voices
- Voice design
- Speech to Text
- Long-form audio generation
- Basic and Advanced TTS Models
Pair it with Song Creator Pro (Windows only) for AI music generation and save.
Buy both for $72.50 on itch.io
Save $27.50 compared to buying separately
Already own Song Creator Pro? Get 50% off on Voice Creator Pro
On both itch.io & Microsoft · Discount applies at checkout automatically
FAQ
Common questions
No. The desktop app runs entirely offline after installation. Your voice data never leaves your device.
Windows 10+ with a GPU (8 GB VRAM recommended) or macOS with Apple Silicon (M1+). 8 GB RAM minimum, 12 GB+ recommended.
Your license is tied to the store account you purchased with (Microsoft, Apple, or itch.io). You can install the desktop app on any device signed into that account.
No. The desktop app is a one-time purchase starting at $54.99 that gives you lifetime access. There are no monthly fees or usage limits.
Refunds for the desktop app are handled through the store where you purchased. For Microsoft Store, request through your purchase history. For Mac App Store, use Apple's report a problem page. For itch.io, contact us through the itch.io page.
You do. Any audio you generate with the desktop app or Cloud is entirely yours. You retain full ownership and rights to all generated content, with no royalties or attribution required.
Yes. Both the desktop app and Cloud include full commercial rights. You can use generated voices in YouTube videos, podcasts, audiobooks, games, apps, and any other commercial projects.
Both the desktop app and Cloud use zero-shot voice cloning, meaning they can replicate a voice from a single short sample. 3 to 10 seconds of clean audio is the sweet spot. Longer samples don't improve quality.
Get Updates
Get notified about new features, platform launches, and updates. No spam, unsubscribe anytime.