openai/whisper-large-v3

Convert speech in audio to text

Automatic Speech Recognition

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition, translation, and language identification.

This version runs only the most recent Whisper model, large-v3. It’s optimized for high performance and simplicity.

Model Versions

Model Size	Version
large-v3	link
large-v2	link
all others	link

While this implementation only uses the large-v3 model, we maintain links to previous versions for reference.

Whisper uses a Transformer sequence-to-sequence model trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. All of these tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing for a single model to replace many different stages of a traditional speech processing pipeline.

🔎

Similar to openai/whisper-large-v3

Qwen/Qwen3-ASR-1.7B

The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects.

Automatic Speech Recognition