Authors:
Sowmya Gali,Y Madhusudhan Reddy,Nagella Jyothsna,Ernest Ravindran R. S,Pasuluri Binduswetha,Charan Sai Raja Vennakandla,DOI NO:
https://doi.org/10.26782/jmcms.2026.03.00006Keywords:
Emotion Recognition,Whispered Speech,Wavelet Features,Prosodic Features,Spectral Features MFCCs,LDA,Ensemble Learning,Abstract
This paper proposes a new technique for the recognition of emotion from whispered speech, integrating advanced techniques in the extraction of features, feature selection, and classification to enhance accuracy and robustness. The approach begins with extracting three types of features: wavelet features for multi-resolution analysis, prosodic features for pitch and intensity, and spectral features such as formants, Mel-Frequency Cepstral Coefficients (MFCCs), and Long-Term Average Spectrum (LTAS) to capture comprehensive emotional information. A two-step feature selection process, involving partial correlation analysis and Linear Discriminant Analysis (LDA), is deployed to identify and retain the most informative features while reducing dimensionality. Classification is performed using an ensemble learning strategy that associates Support Vector Machine (SVM) and Decision Tree classifiers, with SVM distinguishing between neutral and emotional states, and the Decision Tree further categorizing emotions. Simulation results using the GeWEC dataset show that the suggested approach is effective, achieving significant improvements in Unweighted Average Recall (UAR) across various configurations. This underscores the method’s ability to exactly identify emotional states from whispered speech, offering valuable insights for real-world applications in emotion recognition systems.Refference:
I. AlDahoul, N., Alsharhan, S., Al-Nuaimi, N. and Hassan, M. (2023)“An annotated Arabic speech emotion corpus for affective computing applications”, Speech Communication, Vol. 150, pp. 34–47.
II. Alhammadi, A., AlZahrani, A. and Ghoneim, A. (2023), “Emotion Recognition in Arabic Speech Using Deep Learning Techniques”, IEEE Access, Vol. 11, pp. 29345–29362.
III. Al-Nafjan, A., Hosny, M., Al-Wabil, A. and Al-Ohali, Y. (2023)
“Wavelet-based feature extraction and machine learning for EEG emotion recognition”, Neural Computing and Applications, Vol. 35, No. 18, pp. 13245–13260.
IV. Bahmanbiglu, S.A., Mojiri, F., Abnavi, F., 2017. “The Impact of Language on Voice: an LTAS Study”. J. Voice 31 (249).
V. Benesty, J., Sondhi, M.M. and Huang, Y. (2023) “Speech and Audio Signal Processing: Theory and Practice (2nd Edition)”, Springer Nature, 2023.
VI. Buayai, P., Uthansakul, M., & Uthansakul, P. (2022). Whispered Speech Detection Using Glottal Flow-Based Features. Symmetry, 14(4), 777
VII. D. Poªap, “Model of identity veri_cation support system based on voice and image samples,'' J. Univers. Comput. Sci., vol. 24, pp. 460-474, Jan. 2018.
VIII. George, S. M. and Ilyas, P. M. (2024), “A review on speech emotion recognition: Recent advances, challenges, and the influence of noise”, Neurocomputing.
IX. Haridas, A.V., Marimuthu, R., Sivakumar, V.G., 2018. “A critical review and analysis on techniques of speech recognition: the road ahead”. Int. J. Knowledge-Based Intell. Eng. Syst. 22, 39–57.
X. J. Deng, S. Frühholz, Z. Zhang and B. Schuller, "Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning," in IEEE Access, vol. 5, pp. 5235-5246, 2017.
XI. Khalid, S., Usman, M., Mehmood, R. and Al-Bashir, A. (2023), “Emotion recognition using heart rate variability and machine learning techniques”, IEEE Transactions on Affective Computing, Vol. 14 No. 3, pp. 1896–1908.
XII. Khalil, A., Al-Khatib, W., El-Alfy, E.S., Cheded, L., 2018. Anger detection in Arabic speech dialogs. In: Proceedings of the International Conference on Com- puting Sciences and Engineering, ICCSE 2018 – Proceedings. IEEE, pp. 1–6.
XIII. Koolagudi, S.G., Murthy, Y.V.S., Bhaskar, S.P., 2018. “Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition”. Int. J. Speech Technol. 21, 167–183.
XIV. Ko, S.-C., Kim, K.-Y. and Lee, J.-H. (2023) “Emotion recognition from whispered speech using phase-based and spectral features”, IEEE Access, Vol. 11, pp. 118245–118258.
XV. Liao, Y., Gao, Y., Wang, F., Zhang, L., Xu, Z. & Wu, Y. (2025), “Emotion Recognition with Multiple Physiological Parameters Based on Ensemble Learning”, Scientific Reports, 15, 19869.
XVI. Li, C., Zhang, Y. and Wang, S. (2023),“Entropy-guided wavelet packet decomposition for optimal feature selection in non-stationary signal analysis”,Signal Processing, Vol. 205, Article 108857.
XVII. Markovic, B., Miji?, M., & Gali?, J. (2018). Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition. Archives of Acoustics, 43(1), 3-9.
XVIII. Mehta, D., Zañartu, M. and Hillman, R. (2023) “Robust fundamental frequency estimation for pathological voice analysis using signal processing and machine learning”, IEEE Access, 2023.
XIX. Qureshi, M. A., Anwar, S., and Lee, J. (2024), “Improved Speech Emotion Recognition Using Enhanced MFCC and Deep Learning Features”, IEEE Transactions on Affective Computing, Vol. 15, pp. 410–423.
XX. Roy, A., Keshava, A., & Das, A. (2022). Group Delay based Methods for Detection and Recognition of Whispered Speech. 2022 26th International Conference on Pattern Recognition (ICPR), 3512-3518.
XXI. R. Wang and A. Hamdulla, "Fusion of MFCC and IMFCC for Whispered Speech Recognition," 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML), Chengdu, China, 2022, pp. 285-289
XXII. Scherer, K.R. and Bänziger, T. (2023) “Vocal expression of emotion: A review of acoustic patterns and affective communication”, IEEE Transactions on Affective Computing, Vol. 14, No. 4, pp. 2561–2575.
XXIII. Schuller, B., Batliner, A., Burkhardt, F., Steidl, S. and Devillers, L. (2023) “Paralinguistics in speech and language – State of the art and future directions”,IEEE Transactions on Affective Computing, Vol. 14, No. 1, pp. 1–18.
XXIV. Sivan, D., & Gopakumar, C. (2017). Emotion recognition and spoof detection from whispered speech. 2017 International Conference on Computing Methodologies and Communication (ICCMC).
XXV. Sharma, S., Kaur, P. & Singh, G. (2023), “Speech emotion recognition using ensemble classifiers and optimized feature sets”, IEEE Transactions on Affective Computing, Vol. 14, No. 5, pp. 2031–2043.
XXVI. Sharma, V., Rahman, S., & Fujii, Y. (2023). End-to-end whispered speech recognition with frequency-weighted approaches and layer-wise transfer learning. Acoustics, 15(2), 68.
XXVII. Shuai, L., Huang, Z., & Liu, J. (2020). End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning. arXiv preprint arXiv:2005.01972
XXVIII. Sung-Chul Ko , Young Sik, & Kyu-Young Kim (2016). Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access, 4, 6074-6082.
XXIX. Tirumala, S.S., Shahamiri, S.R., Garhwal, A.S., Wang, R., 2017. “Speaker identification features extraction methods: a systematic review”. Expert Syst. Appl. doi: 10.1016/j.eswa.2017.08.015.
XXX. Thagard, P. , 2019. Mind Society: From Brains to Social Sciences and Professions. Oxford University Press (March 1, 2019)
XXXI. Wang, J., Li, Y., Zhang, Z. and Hamdulla, A. (2024),“Emotion recognition from whispered speech in tonal languages using acoustic feature fusion”, Speech Communication, Vol. 156, pp. 1–13.
XXXII. Y. Bhavani, S. B. Swathi, R. R. Aileni, and M. R. Gaddam, "A Survey on Various Speech Emotion Recognition Techniques," 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), 2022, pp. 01-06.
XXXIII. Yüksel, M., Gündüz, B., 2018. “Long term average speech spectra of Turkish”. Logop. Phoniatr. Vocology 43, 101–105
XXXIV. Z. Cheng and X. Li, "Whispered Speech Emotion Recognition Based on Improved Shuffled Frog Leaping Algorithm Neural Network," Journal of Convergence Information Technology, vol. 7, no. 19, pp. 114-124, 2012.
XXXV. Zhang, H., Liu, Y. and Wang, X. (2023),“Discriminative feature selection using Fisher criterion and linear discriminant analysis for pattern recognition”,IEEE Access, Vol. 11, pp. 98734–98747.
XXXVI. Zhang, Li, and Ying Zhao. "Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering." IEEE Transactions on Audio, Speech, and Language Processing, vol. 31, no. 7, 2023, pp. 1234-1245.
XXXVII. Zhaofeng Lin, Tanvina Patel, Odette Scharenborg, “Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation”, 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – Taipei, Taiwan.

