Wav2Vec2-BERT+LM: Transcribing Speech and Evaluating Models using Huggingface Transformers

What is Wav2Vec2-BERT? Wav2Vec2-BERT is a successor of the popular Wav2Vec2 Model, a pre-trained model for Automatic Speech Recognition (ASR). Wav2Vec2-BERT is a 580M-parameters audio model that has been pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. Following the basic architecture of Wav2Vec2, with increased pretraining data and slighly different training objectives, various models (XLSR, XLS-R and MMS) with pretrained checkpoints were released. Wav2Vec2-BERT pretrained model was introduced in the SeamlessM4T Paper by Meta in August 2023. [Read More]