A step-by-step tutorial for transcribing audio to text in Voice Creator Pro using uploads, YouTube URLs, and live recording.

Transcribe Audio to Text

In this tutorial you will use the STT (Speech to Text) tab to transcribe audio into text and export it as subtitles or timestamped JSON.

The whole process takes under a minute per audio source.

Prerequisites

Voice Creator Pro, running either as the desktop app on your machine, or in your browser with VCP Cloud
An audio file, a YouTube URL, or a microphone (depending on which workflow you follow)

Workflow A: Transcribe an Uploaded File

Step 1: Open the STT Tab

Click Lab in the left sidebar, then select the STT tab at the top. You will see two main areas: the Audio Source panel on the left and the Results panel on the right.

Step 2: Upload Your Audio

Click Upload in the Audio Source panel and select an audio file from your computer (WAV, MP3, or any common format).

Step 3: Set the Language

Choose the correct language from the Language dropdown. If you are not sure or the audio contains multiple languages, leave it on Auto-detect.

Step 4: Pick an ASR Model

Click the model badge in the Results panel to switch between available ASR model families. The default works well for most cases, but try a different model if accuracy is not satisfactory.

Step 5: Transcribe

Click Transcribe. The full transcription text appears in the Results panel once processing finishes.

Workflow B: Transcribe from a YouTube Video

Step 1: Open the STT Tab

Click Lab in the left sidebar, then select the STT tab at the top.

Step 2: Paste the YouTube URL

Click YouTube in the Audio Source panel, paste the video URL, and let Voice Creator Pro extract the audio.

Step 3: Set Language and Model

Choose a language (or leave on Auto-detect) and select an ASR model by clicking the model badge.

Step 4: Transcribe

Click Transcribe. The result appears in the Results panel. This is a quick way to pull text from interviews, podcasts, or any public video.

Workflow C: Record and Transcribe Live

Step 1: Open the STT Tab

Click Lab in the left sidebar, then select the STT tab at the top.

Step 2: Record from Your Microphone

Click Record in the Audio Source panel and speak into your microphone. Click stop when you are done.

Step 3: Set Language and Model

Choose a language and ASR model as described above.

Step 4: Transcribe

Click Transcribe to see the text of what you just recorded.

Workflow D: Transcribe Audio Generated in Voice Creator Pro

Step 1: Open the STT Tab

Click Lab in the left sidebar, then select the STT tab at the top.

Scroll down to the History section at the bottom of the page. This shows all audio you have previously generated in the Clone, Design, or TTS tabs. Find the generation you want to transcribe and click Use to load it as the audio source.

Step 3: Transcribe

Set your language and ASR model, then click Transcribe. This is useful for verifying what was generated, creating subtitles for generated voiceovers, or repurposing generated audio into written content.

Exporting Results

Once you have a transcription, you can export it in two formats:

SRT - A standard subtitle file. Drop it into any video editor (Premiere Pro, DaVinci Resolve, CapCut, etc.) to add captions instantly.
JSON - Includes word-level timestamps. Use this when you need precise alignment for programmatic workflows, custom subtitle styling, or audio editing tools.

Click the corresponding export button in the Results panel to download the file.

Tips

Use History for previous generations. If you already created audio in Clone, Design, or TTS, open the History source in STT and click Use to load it directly. No need to export and re-upload.
Auto-detect is good, but explicit is better. If you know the language, select it manually. This gives the model a head start and can improve accuracy on short clips.
Try a different ASR model if results are rough. Click the model badge and switch families. Different models handle accents, background noise, and speaking speeds differently.
SRT for video, JSON for code. Pick SRT when you just need subtitles. Pick JSON when you plan to process timestamps programmatically.

Next Steps

Speech to Text reference - Full details on every setting, audio source, and export format
Clone a Voice - End-to-end tutorial for voice cloning
Voice Cloning reference - Deep dive on cloning settings and models

How to Transcribe Audio to Text

Transcribe Audio to Text

Prerequisites

Workflow A: Transcribe an Uploaded File

Step 1: Open the STT Tab

Step 2: Upload Your Audio

Step 3: Set the Language

Step 4: Pick an ASR Model

Step 5: Transcribe

Workflow B: Transcribe from a YouTube Video

Step 1: Open the STT Tab

Step 2: Paste the YouTube URL

Step 3: Set Language and Model

Step 4: Transcribe

Workflow C: Record and Transcribe Live

Step 1: Open the STT Tab

Step 2: Record from Your Microphone

Step 3: Set Language and Model

Step 4: Transcribe

Workflow D: Transcribe Audio Generated in Voice Creator Pro

Step 1: Open the STT Tab

Step 2: Load from History

Step 3: Transcribe

Exporting Results

Tips

Next Steps

On this page