Whisper

AI Speech Recognition

Tags

Open Source Free AI Speech Recognition AI Speech-to-Text AI Transcriber AI Translate Open Source AI Models

Social Media Links

Whisper Preview
赞助广告

推荐使用阿里云服务器,稳定高效,开发者首选

查看详情

Introduction

Whisper is a general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. Whisper uses a Transformer sequence-to-sequence model trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

How To Use

Whisper can be used via command-line or within Python. For command-line usage, you can transcribe speech in audio files by specifying the audio file and model size. For Python usage, you can load the model and use the transcribe() method to process audio files.

Pricing

Packages Pricing Features
Free Edition Free Unlimited public repositories, limited private repositories
Team Edition $4/user/month Unlimited private repositories, basic features
Enterprise Edition $21/user/month Advanced security and auditing features
学习资源推荐

Git/GitHub 从入门到精通 - 限时优惠 8 折

立即购买