Modelling South Kamrupi Dialect of Assamese Language using HTK


Ranjan Das,Uzzal Sharma,



Dialect Modelling,,Automatic Speech Recognition, Corpora Building,Feature Extraction, HTK,


This paper addresses the fundamental issues of developing a speaker independent, dialect modelling system for recognizing the widely spoken, colloquial South Kamrupi dialect of Assamese language. The proposed dialect model is basically designed on Hidden Markov Model (HMM). Hidden Markov Model Toolkit (HTK) is used here as the building block for feature extraction, training, recognition and verification for the model building process. A primary corpus is built as a prerequisite for the empirical study. Altogether, 16 people (9 male, 7 female) are volunteering in the primary corpora building process. The corpora are comprised of one training and two testing sets of recorded speech files. The whole corpora are made up of around 2.5 hours of recordings. The proposed dialect model is trained on South Kamrupi dialect training corpora. A comparative test recognition is carefully designed and carried out which exhibit a recognition correctness of 87.13% for South Kamrupi dialect and 68.52% correctness for the Central Kamrupi dialect. Thus, the findings of this paper evidence that the dialect modelling with proper training has recognized a dialect with better precision.


