Breaking New Ground in Automatic Transcription
ElevenLabs has introduced Scribe, its first Speech-to-Text (ASR) model, setting a new standard for transcription accuracy. Designed to handle the complexities of real-world audio, Scribe supports 99 languages and delivers industry-leading precision with features such as word-level timestamps, speaker diarization, and audio-event tagging for seamless integration.
Outperforming the Competition
Scribe has been rigorously tested against top industry models, including Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3, consistently achieving superior results in FLEURS and Common Voice benchmark evaluations. The model excels across a diverse set of languages, boasting the lowest word error rates in Italian (98.7%), English (96.7%), and 97 other languages.
Unlike many competing ASR solutions, which struggle with high error rates in underserved languages, Scribe significantly reduces transcription errors in Serbian, Cantonese, and Malayalam, where alternatives often exceed 40% word error rates.
A Seamless Solution for Developers and Businesses
Developers can integrate Scribe via ElevenLabs’ Speech-to-Text API, receiving structured JSON transcripts enriched with speaker tags, word-level timestamps, and non-speech event markers such as laughter. Additionally, a low-latency version for real-time applications is in development.
For content creators and businesses, Scribe is accessible directly through the ElevenLabs dashboard, allowing users to upload audio or video files and generate formatted, high-accuracy transcripts in seconds.
As ElevenLabs continues to push the boundaries of AI-driven speech technology, Scribe stands as a major advancement in making automatic speech recognition more accurate, accessible, and inclusive across languages and industries.
For more details, visit the official announcement.
