Whisper
AI Speech RecognitionTags

Introduction
Whisper is a general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. Whisper uses a Transformer sequence-to-sequence model trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
How To Use
Whisper can be used via command-line or within Python. For command-line usage, you can transcribe speech in audio files by specifying the audio file and model size. For Python usage, you can load the model and use the transcribe() method to process audio files.
Pricing
Packages | Pricing | Features |
---|---|---|
Free Edition | Free | Unlimited public repositories, limited private repositories |
Team Edition | $4/user/month | Unlimited private repositories, basic features |
Enterprise Edition | $21/user/month | Advanced security and auditing features |