Monday, September 26, 2022
HomeData ScienceOpenAI Open-Sources ‘Whisper’ — a Multilingual Speech Recognition System

OpenAI Open-Sources ‘Whisper’ — a Multilingual Speech Recognition System


Speech recognition stays a problem in AI. Nonetheless, OpenAI has simply moved one step nearer to fixing it. In a weblog submit final week, OpenAI launched Whisper—a multilingual, computerized speech recognition system that’s educated and open sourced to strategy human stage robustness and accuracy on English speech recognition. 

Quite a few organisations corresponding to Google, Meta and Amazon have developed extremely succesful speech recognition methods. However OpenAI claims that Whisper stands out. The mannequin is educated on 680,000 hours of multilingual and multitask supervised knowledge collected from the net. It claims to have an improved recognition of background noise, distinctive accents, and technical jargon owing to the usage of such a big and numerous dataset. 

Join your weekly dose of what is up in rising know-how.

The corporate’s open-sourced fashions and inference code function a basis for constructing helpful functions and increase additional analysis on sturdy speech processing.

Supply: Introducing Whisper, OpenAI

An excerpt from the weblog reads, “The Whisper structure is a straightforward end-to-end strategy, applied as an encoder-decoder Transformer. Enter audio is cut up into 30-second chunks, transformed right into a log-Mel spectrogram, after which handed into an encoder. A decoder is educated to foretell the corresponding textual content caption, intermixed with particular tokens that direct the only mannequin to carry out duties corresponding to language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.”

The corporate says that different present approaches regularly use smaller, extra carefully paired audio-text coaching datasets or broad however unsupervised audio pretraining. Since Whisper was educated on a big, numerous dataset (a few third of which is non-English audio dataset) with out being fine-tuned to any particular one, it doesn’t beat fashions that specialize in LibriSpeech efficiency. 

When measured, findings present that Whisper’s zero-shot efficiency throughout many numerous datasets is powerful—making 50% fewer errors than different fashions. OpenAI hopes that the mannequin’s ease of use and excessive accuracy will enable builders so as to add voice interfaces to a wider set of functions. 

To be taught extra concerning the paper, mannequin card, and extra particulars on Whisper, click on right here

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments