In the previous post, I had shared the work in progress version of a finite state transducer based Malaylam phonetic analyser. A phonetic analyser analyses the written form of the text to give the phonetic characteristics of the grapheme sequence.
Understanding the phonetic characteristics of a word is helpful in many computational linguistic problems. For instance, translating a word into its phonetic representation is needed in the synthesis of a text to speech (TTS) system. The phonetic representation is helpful to transliterate the word to a different script. It will be useful if the phonetic representation can be converted back to the grapheme sequence.
The first version of project mlphon is now released. It is packaged as a python library in Pypi. You can now install it by
pip install mlphon
It has built-in methods for bidirectional grapheme to phoneme conversions, IPA mappings and a syllablizer. These three functions has command line tools as well. Tryout for yourself.
For the input
the output would be
<BoS>സ<EoS><BoS>ഫ<EoS><BoS>ല<EoS><BoS>മീ<EoS><BoS>യാ<EoS><BoS>ത്ര<EoS> ['സ', 'ഫ', 'ല', 'മീ', 'യാ', 'ത്ര']
<BoS> indicate the beginning of a syllable and
<EoS> the end of a syllable.
G2P analysis and synthesis
$ mlg2p -a
Give the input
It will give you the result of g2p analysis as:
The details of each phoneme are given in angle brackets. The operation is bidirectional. You can retrieve the graphemes from the analysis string.
IPA analysis and synthesis
If the phonetic detailing is not relevant to you, a minimal mapping of the graphemes to IPA can be obtained by
$ mlipa -a
For the input
The output would be
Certain tags like
<chil>, <anuswara>, <visaraga> are retained so that bidirectional analysis and generation are unambiguously possible.
Will update the progress here. For a quick web demo of what mlphon does, checkout this link https://phon.smc.org.in/
Thanks for reading 😀.