Indian Languages and Text Normalization: Part 1

This is a two part article. The first part will cover how the normalization routine in the popular ASR engine Whisper, removes essential characters like vowel signs in Indian languages while evaluating the performance. The second part (yet to be written) will cover various existing libraries and the approaches needed to perform proper normalization in Indian languages. Text Normalization Text Normalization in natural language processing (NLP) refers to the conversion of different written forms of text to one standardised form. [Read More]