Authors:
Tanvi Desai,Divyakant Meva,DOI NO:
https://doi.org/10.26782/jmcms.2025.01.00006Keywords:
NLP,Twitter tweets,HEXACO,Optimization,Hybrid Kepler Inspired Secretary Bird,weighted ensemble voting,Abstract
Natural Language Processing (NLP) plays a crucial role in analyzing Twitter data to introduce an automated HEXACO model. Analyzing personality traits from social media data, particularly on platforms like Twitter, presents unique challenges due to the brevity, informal language, and rapid evolution of linguistic expressions. To overcome these drawbacks, this research presents a methodological framework for investigating a novel HEXACO personality trait using Twitter tweets. The HEXACO model encompasses Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience, offering a comprehensive basis for personality analysis. Our approach integrates advanced NLP techniques across key phases: preprocessing, feature extraction, feature selection, and final detection. Preprocessing involves tokenization, stop word removal, and stemming to standardize data quality. Feature extraction leverages contextual Term Frequency-Inverse Document Frequency (TF-IDF), and Global Vectors for Word Representation (GloVe) embeddings models to capture semantic and contextual information from tweets. Feature selection employs the Hybrid Kepler Inspired Secretary Bird (HKISP) algorithm, a combination of the Kepler Optimization Algorithm (KOA) and Secretary Bird Optimization (SBO). The final detection phase utilizes a weighted ensemble voting model comprising Artificial Neural Networks (ANN), Random Forest (RF), and k-Nearest Neighbours (k-NN) classifiers to enhance predictive accuracy and model robustness. The proposed technique achieved a classification Accuracy of 98.067% and a Hamming loss of 1.933%, which is proved to be superior to the existing models based on the obtained experimental findings.Refference:
I. Hassan, Saeed-Ul, Aneela Saleem, Saira Hanif Soroya, Iqra Safder, Sehrish Iqbal, Saqib Jamil, Faisal Bukhari, Naif Radi Aljohani, and Raheel Nawaz. : ‘Sentiment Analysis Of Tweets Through Altmetrics: A Machine Learning Approach.’ Journal of Information Science. Vol. 47, No. 6, pp. 712-726, 2021.
II. Khan, Rijwan, Piyush Shrivastava, Aashna Kapoor, Aditi Tiwari, and Abhyudaya Mittal. : ‘Social Media Analysis With AI: Sentiment Analysis Techniques For The Analysis Of Twitter Covid-19 Data.’ J. Crit. Rev. Vol. 7, No. 9, pp. 2761-2774, 2020.
III. Gupta, Vibhuti, and Rattikorn Hewett. : ‘Real-Time Tweet Analytics Using Hybrid Hashtags On Twitter Big Data Streams.’ Information. Vol. 11, No. 7, pp. 341, 2020.
IV. Wijeratne, Sanjaya, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir Hossein Yazdavar, and Krishnaprasad Thirunarayan. : ‘Feature Engineering For Twitter-Based Applications.’ In Feature Engineering for Machine Learning and Data Analytics, CRC Press, pp. 359-393, 2018.
V. Ramírez-Sáyago, Ernesto. : ‘Sentiment Analysis From Twitter Data Regarding The Covid-19 Pandemic.’ 2020.
VI. AlBadani, Barakat, Ronghua Shi, and Jian Dong. : ‘A Novel Machine Learning Approach For Sentiment Analysis On Twitter Incorporating The Universal Language Model Fine-Tuning And SVM.’ Applied System Innovation, Vol. 5, No. 1, pp. 13, 2022.
VII. Anjum, Mehnaz, Akmal Khan, Shabir Hussain, M. Zeeshan Jhandir, Rafaqat Kazmi, Imran Sarwar Bajwa, and Muhammad Abid Ali. : ‘Sentiment Analysis of Twitter Tweets For Mobile Phone Brands.’ Pakistan Journal of Engineering and Technology, Vol. 4, No. 1, pp. 131-138, 2021.
VIII. Ramezani, Majid, Mohammad-Reza Feizi-Derakhshi, Mohammad-Ali Balafar, Meysam Asgari-Chenaghlu, Ali-Reza Feizi-Derakhshi, Narjes Nikzad-Khasmakhi, Mehrdad Ranjbar-Khadivi, Zoleikha Jahanbakhsh-Nagadeh, Elnaz Zafarani-Moattar, and Taymaz Akan. : ‘Automatic Personality Prediction: An Enhanced Method Using Ensemble Modelling.’ Neural Computing and Applications, Vol. 34, No. 21, pp. 18369-18389, 2022.
IX. Garg, Shruti, and Ashwani Garg. : ‘Comparison of Machine Learning Algorithms for Content-Based Personality Resolution of Tweets.’ Social Sciences & Humanities Open, Vol. 4, No. 1, pp. 100178, 2021.
X. Yang, Qi, Aleksandr Farseev, Sergey Nikolenko, and Andrey Filchenkov. : ‘Do We Behave Differently On Twitter And Facebook: Multi-View Social Network User Personality Profiling For Content Recommendation.” Frontiers in big Data, Vol. 5, pp. 931206, 2022.
XI. Salminen, Joni, Soon-gyo Jung, Hind Almerekhi, Erik Cambria, and Bernard Jansen. : ‘How Can Natural Language Processing And Generative Ai Address Grand Challenges Of Quantitative User Personas?’ In International Conference on Human-Computer Interaction, Cham: Springer Nature Switzerland, pp. 211-231, 2023.
XII. Balli, Cagla, Mehmet Serdar Guzel, Erkan Bostanci, and Alok Mishra. : ‘Sentimental Analysis Of Twitter Users From Turkish Content With Natural Language Processing.’ Computational Intelligence and Neuroscience, No. 1, pp. 2455160, 2022.
XIII. Alkhaldi, A. Nora, Yousef Asiri, Aisha M. Mashraqi, Hanan T. Halawani, Sayed Abdel-Khalek, and Romany F. Mansour. : ‘Leveraging Tweets For Artificial Intelligence Driven Sentiment Analysis On The Covid-19 Pandemic.’ In Healthcare, MDPI, Vol. 10, No. 5, pp. 910, 2022.
XIV. Hossny, Ahmad Hany, Lewis Mitchell, Nick Lothian, and Grant Osborne.: ‘Feature Selection Methods For Event Detection In Twitter: A Text Mining Approach.’ Social Network Analysis and Mining, Vol. 10, pp. 1-15, 2020.
XV. Alvarado, Berenice Jacqueline Sánchez, and Pedro Esteban Chavarrias Solano. ‘Detecting Disaster Tweets Using A Natural Language Processing Technique.’ Vol. 11, 2021.
XVI. Yang, Qi, Aleksandr Farseev, and Andrey Filchenkov. : ‘Two-Faced Humans On Twitter And Facebook: Harvesting Social Multimedia For Human Personality Profiling.’ In Proceedings of the 2021 ACM Workshop on Intelligent Cross-Data Analysis and Retrieval, pp. 39-47, 2021.
XVII. Klein, Z. Ari, Arjun Magge, Karen O’Connor, Jesus Ivan Flores Amaro, Davy Weissenbacher, and Graciela Gonzalez Hernandez. : ‘Toward Using Twitter For Tracking Covid-19: A Natural Language Processing Pipeline And Exploratory Data Set.’ Journal of medical Internet research, Vol. 23, No. 1, pp. e25314, 2021.
XVIII. Salsabila, Ghina Dwi, and Erwin Budi Setiawan. : ‘Semantic Approach For Big Five Personality Prediction On Twitter.’ Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 5, No. 4, pp. 680-687, 2021.
XIX. KN, Pavan Kumar, and Marina L. Gavrilova. : ‘Latent Personality Traits Assessment From Social Network Activity Using Contextual Language Embedding.’ IEEE Transactions on Computational Social Systems, Vol. 9, No. 2, pp. 638-649, 2021.
XX. Yang, Yuan-Chi, Angel Xie, Sangmi Kim, Jessica Hair, Mohammed Al-Garadi, and Abeed Sarker. : ‘Automatic detection of twitter users who express chronic stress experiences via supervised machine learning and natural language processing.’ CIN: Computers, Informatics, Nursing, Vol. 41, No. 9, pp. 717-724, 2023.
XXI. Nanath, Krishnadas, and Geethu Joy. : ‘Leveraging Twitter Data To Analyze The Virality Of Covid-19 Tweets: A Text Mining Approach.’ Behaviour & Information Technology, Vol. 42, No. 2, pp. 196-214, 2023.
XXII. Dandash, Mokhaiber, and Masoud Asadpour. : ‘Personality Analysis For Social Media Users Using Arabic Language And Its Effect On Sentiment Analysis.’ arXiv preprint arXiv:2407.06314, 2024.
XXIII. Vysotska, Victoria, Petro Pukach, Vasyl Lytvyn, Dmytro Uhryn, Yuriy Ushenko, and Zhengbing Hu. : ‘Intelligent Analysis Of Ukrainian-Language Tweets For Public Opinion Research Based On Nlp Methods And Machine Learning Technology.’ International Journal of Modern Education and Computer Science (IJMECS), Vol. 15, No. 3, pp. 70-93, 2023.
XXIV. R. Patel, and K. Passi. : ‘Sentiment Analysis On Twitter Data Of World Cup Soccer Tournament Using Machine Learning.’ IoT, Vol. 1, No. 2, pp. 218–239, 2020.
XXV. Golam Mostafa, Ikhtiar Ahmed, and Masum Shah Junayed. : ‘Investigation Of Different Machine Learning Algorithms To Determine Human Sentiment Using Twitter Data.’ International Journal of Information Technology and Computer Science (IJITCS), Vol. 13, No. 2, pp. 38-48, 2021.
XXVI. Data Set Link: https://github.com/Tanvidesai-twitter/Twitter-Dataset.git