Modelling South Kamrupi Dialect of Assamese Language using HTK


Ranjan Das,Uzzal Sharma,



Dialect Modelling,,Automatic Speech Recognition, Corpora Building,Feature Extraction, HTK,


This paper addresses the fundamental issues of developing a speaker independent, dialect modelling system for recognizing the widely spoken, colloquial South Kamrupi dialect of Assamese language. The proposed dialect model is basically designed on Hidden Markov Model (HMM). Hidden Markov Model Toolkit (HTK) is used here as the building block for feature extraction, training, recognition and verification for the model building process. A primary corpus is built as a prerequisite for the empirical study. Altogether, 16 people (9 male, 7 female) are volunteering in the primary corpora building process. The corpora are comprised of one training and two testing sets of recorded speech files. The whole corpora are made up of around 2.5 hours of recordings. The proposed dialect model is trained on South Kamrupi dialect training corpora. A comparative test recognition is carefully designed and carried out which exhibit a recognition correctness of 87.13% for South Kamrupi dialect and 68.52% correctness for the Central Kamrupi dialect. Thus, the findings of this paper evidence that the dialect modelling with proper training has recognized a dialect with better precision.


I.B. Kakati, ―Assamese its formation and development‖. Guwahati, India, LBS publication, 2007.

II.B. Ramani, S. L Christina, G. A Rachel, V. S Solomi, M. K Nandwana, A. Prakash, S. A Shanmugam, R. Krishnan, S. K Prahalad and K.Samudravijaya, ―A common attribute based unified hts framework for speech synthesis in Indian languages‖, In Eighth ISCA Workshop on Speech Synthesis, 2013.

III.D. Jurafsky and J. H Martin, ―Speech and language processing‖, volume 3. Pearson London, 2014.

IV.D. S Kulkarni, R. R Deshmukh, P. P Shrishrimal, and S. D Waghmare, ―Htk based speech recognition systems for indian regional languages: A review‖ 2016.

V.G. Aneeja and B. Yegnanarayana, ―Extraction of fundamental frequency from degraded speech using temporal envelopes at high snr frequencies‖, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(4):829–838, 2017.

VI.G. Anumanchipalli, R. Chitturi, S. Joshi, R. Kumar,S. P Singh, RNV Sitaram, and SP Kishore, ―Development of indian language speech databases for large vocabulary speech recognition systems‖, In Proc. SPECOM, 2005.

VII.G. Salvi, ―Htk tutorial‖, KTH Royal Institute of Technology, Department of Speech, Music and Hearing, Drottning Kristinas, 31, 2003.

VIII.H. Sarfraz, S. Hussain, R. Bokhari, A. A Raza, I. Ullah, Z. Sarfraz, S. Pervez, A. Mustafa, I. Javed and R. Parveen, ―Speech corpus development for a speaker independent spontaneous urdu speechrecognition system‖, Proceedings of the O-COCOSDA,Kathmandu, Nepal, 2010.

IX.H. Sarma, N. Saharia, and U. Sharma, ―Development and analysis of speech recognition systems for assamese language using htk‖, ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(1):7, 2017.

X.K. Kumar, RK Aggarwal, and A. Jain, ―A hindi speech recognition system for connected words using htk‖, International Journal of Computational Systems Engineering, 1(1):25–32, 2012.

XI.K. Medhi, ―Assamese grammar and origin of the Assamese language‖. Publication Board, Assam, 1988.

XII.K. Tokuda and H. Zen, ―Fundamentals and recent advances in hmm-based speech synthesis‖, Tutorial of INTERSPEECH, 2009.

XIII.L. Besacier, E. Barnard, A. Karpov,and T. Schultz, ―Automatic speech recognition for under-resourced languages: A survey‖, Speech Communication, 56:85–100, 2014.

XIV.L. R Rabiner, ―A tutorial on hidden markov models and selected applications in speech recognition‖, Proceedings of theIEEE, 77(2):257–286, 1989.

XV.M. Dua, RK Aggarwal, V. Kadyan and S. Dua, ―Punjabi automatic speech recognition using htk‖, International Journal of Computer Science Issues (IJCSI), 9(4):359, 2012.

XVI.M. S Liang, R. Y Lyu, and Y. C Chiang, ―Phonetic transcription using speech recognition technique considering variations in pronunciation‖, In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, volume 4, pages IV–109. IEEE, 2007.

XVII.R. Das and U. Sharma, ―Extracting acoustic feature vectors of south kamrupi dialect through mfcc‖, In Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on, pages 2808–2811. IEEE, 2016.

XVIII.S. L Maguer, I. Steiner, and A. Hewer, ―An hmm/dnn comparison for synchronized text-to-speech and tongue motion synthesis‖, Proc. Interspeech 2017, pages 239–243, 2017.

XIX.S. Mahanta. ―Assamese‖, Journal of the International Phonetic Association, 42(2):217–224, 2012.

XX.S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, and D. Povey, ―The htk book‖, Cambridge university engineering department, 3.5:433, 2015.

XXI.T. F Quatieri, ―Discrete-time speech signal processing: principles and practice‖, Pearson Education India, 2006.

XXII.V. Sneha, G Hardhika, K J. Priya, and D. Gupta, ―Isolated kannada speech recognition using htk —a detailed approach‖, In Progress in Advanced Computing and Intelligent Engineering, pages 185–194. Springer, 2018.

Ranjan Das, Uzzal Sharma View Download