PREDICTING TREATMENT UNFAVOURABLE IN PULMONARY TUBERCULOSIS PATIENTS USING STACKING ENSEMBLE MACHINE LEARNING APPROACH
Authors:
Fayaz Ahamed Shaik , Lakshmanan Babu, Palaniyandi Paramasivam, Selvam Nagarajan, Sundarakumar Karuppasamy, Ponnuraja ChinnaiyanDOI NO:
https://doi.org/10.26782/jmcms.2025.05.00002Abstract:
The leading infectious disease-related cause of mortality for people is tuberculosis (TB). India is one of the countries with the highest rates of TB worldwide, making it a serious public health problem. People with active lung TB can spread the illness by spitting, coughing, or sneezing. In healthcare, the application of machine learning (ML) that helps in diagnosis is on the rise. In this study, we suggest a stacked ensemble model that combines three base ML classifier models to predict treatment-unfavorable in Pulmonary TB (PTB) patients. Cases with unfavorable treatment are considered as the event of interest. Retrospectively, secondary data of 1236 PTB patients treated in randomized controlled clinical research were obtained and split into training and testing data in a 70:30 ratio. Several ML models had different levels of effectiveness in predicting treatment-unfavorable outcomes in PTB patients. The Support Vector Machines model struggled with sensitivity (0.246) but had high specificity (0.981). Likewise, the Logistic Regression model showed poor sensitivity (0.339) but strong specificity (0.959). The Decision Tree model, on the other hand, did well, with high sensitivity (0.755) and specificity (0.956). With the best accuracy (0.929), sensitivity (0.774), specificity (0.956), and F1-score (0.759), the stacked Ensemble Random Forest model performed better than the others. This illustrates the prospective of ensemble learning in the healthcare industry, where it is essential to identify negative effects early and accurately. To improve prediction accuracy and generalizability, future research should verify these results and explore other clinical characteristics.Keywords:
Clinical Trial,Cross-Validation,Ensemble,Machine Learning,Pulmonary Tuberculosis,Refference:
I. Abu Al-Haija, Qasem, Moez Krichen, and Wejdan Abu Elhaija. “Machine-learning-based darknet traffic detection system for IoT applications.” Electronics 11.4 (2022): 556. 10.3390/electronics11040556
II. Amini, Payam, et al. “Prevalence and determinants of preterm birth in Tehran, Iran: a comparison between logistic regression and decision tree methods.” Osong public health and research perspectives 8.3 (2017): 195. 10.24171/j.phrp.2017.8.3.06
III. Ayodele, Taiwo Oladipupo. “Types of machine learning algorithms.” New advances in machine learning 3.19-48 (2010): 5-1. https://web.archive.org/web/20160417233342id_/http:/cdn.intechweb.org:80/pdfs/10694.pdf
IV. Bakyarani ES, Srimathi H, Bagavandas M. A survey of machine learning algorithms in health care. Int JSci Technol Res. 2019 Nov; 8(11):223. https://api.semanticscholar.org/CorpusID:212513380
V. Bora, R.M.; Chaudhari, S.N.; Mene, S.P. A Review of Ensemble Based Classification and Clustering in Machine Learning. Int. J. New Innov. Eng. Technol. 2019, 12, 2319–6319. https://www.ijniet.org/wp-content/uploads/2020/01/10.pdf
VI. CDC. Tuberculosis: Causes and How It Spreads. Tuberculosis (TB). https://www.cdc.gov/tb/causes/index.html
VII. Cruz, Joseph A., and David S. Wishart. “Applications of machine learning in cancer prediction and prognosis.” Cancer informatics 2 (2006): 117693510600200030. 10.1177/117693510600200030
VIII. Doupe, Patrick, James Faghmous, and Sanjay Basu. “Machine learning for health services researchers.” Value in Health 22.7 (2019): 808-815. 10.1016/j.jval.2019.02.012
IX. Dye, Christopher. “Global epidemiology of tuberculosis.” The Lancet 367.9514 (2006): 938-940. 10.1016/S0140-6736(06)68384-0
X. Ekins, Sean, et al. “Machine learning and docking models for Mycobacterium tuberculosis topoisomerase I.” Tuberculosis 103 (2017): 52-60. 10.1016/j.tube.2017.01.005
XI. Fawagreh, Khaled, Mohamed Medhat Gaber, and Eyad Elyan. “Random forests: from early developments to recent advancements.” Systems Science & Control Engineering: An Open Access Journal 2.1 (2014): 602-609. 10.1080/21642583.2014.956265
XII. Ganaie, Mudasir A., et al. “Ensemble deep learning: A review.” Engineering Applications of Artificial Intelligence 115 (2022): 105151. 10.1016/j.engappai.2022.105151
XIII. Garcıa-Gila, Diego, et al. “Smart Data based Ensemble for Imbalanced Big Data Classification.” arXiv preprint arXiv:2001.05759 (2020). 10.48550/arXiv.2001.05759
XIV. Global Pandemic. TB Alliance. https://www.tballiance.org/why-new-tb-drugs-global-pandemic/
XV. Hasan, Md Kamrul, et al. “Diabetes prediction using ensembling of different machine learning classifiers.” IEEE Access 8 (2020): 76516-76531. 10.1109/ACCESS.2020.2989857
XVI. Jaeger, Stefan, et al. “Automatic tuberculosis screening using chest radiographs.” IEEE transactions on medical imaging 33.2 (2013): 233-245. 10.1109/TMI.2013.2284099
XVII. Jiang, Tammy, Jaimie L. Gradus, and Anthony J. Rosellini. “Supervised machine learning: a brief primer.” Behavior therapy 51.5 (2020): 675-687. 10.1016/j.beth.2020.05.002
XVIII. Kearns, Michael J., and Umesh Vazirani. An introduction to computational learning theory. MIT press, 1994. 10.7551/mitpress/3897.001.0001
XIX. Kourou, Konstantina, et al. “Machine learning applications in cancer prognosis and prediction.” Computational and structural biotechnology journal 13 (2015): 8-17. 10.1016/j.csbj.2014.11.005
XX. Lin, Weiwei, et al. “An ensemble random forest algorithm for insurance big data analysis.” Ieee access 5 (2017): 16568-16575. 10.1109/ACCESS.2017.2738069
XXI. Logistic Regression: Overview and Applications. Keylabs: latest news and updates. https://keylabs.ai/blog/logistic-regression-overview-and-applications/
XXII. Mahesh, T. R., et al. “The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification.” Healthcare Analytics 4 (2023): 100247. 10.1016/j.health.2023.100247
XXIII. Mian Qaisar, Saeed, et al. “Machine learning with adaptive rate processing for power quality disturbances identification.” SN Computer Science 3.1 (2022): 14. 10.1007/s42979-021-00904-1
XXIV. Mihoub, Alaeddine, et al. “Predicting covid-19 spread level using socio-economic indicators and machine learning techniques.” 2020 first international conference of smart systems and emerging technologies (SMARTTECH). IEEE, 2020. 10.1109/SMART-TECH49988.2020.00041
XXV. Mitchell, Tom M., and Tom M. Mitchell. Machine learning. Vol. 1. No. 9. New York: McGraw-hill, 1997. https://www.cs.cmu.edu/~tom/files/MachineLearningTomMitchell.pdf
XXVI. Mitchell, Tom M., Jaime G. Carbonell, and Ryszard S. Michalski, eds. Machine learning: a guide to current research. Vol. 12. Springer Science & Business Media, 1986. https://link.springer.com/book/10.1007/978-1-4613-2279-5
XXVII. Nasteski, Vladimir. “An overview of the supervised machine learning methods.” Horizons. b 4.51-62 (2017): 56. 10.20544/HORIZONS.B.04.1.17.P05
XXVIII. Nitze, I., U. Schulthess, and H. Asche. “Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification.” Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil 79 (2012): 3540. http://mtc-m16c.sid.inpe.br/col/sid.inpe.br/mtc-m18/2012/05.15.13.21/doc/015.pdf
XXIX. Nusinovici, Simon, et al. “Logistic regression was as good as machine learning for predicting major chronic diseases.” Journal of clinical epidemiology 122 (2020): 56-69. 10.1016/j.jclinepi.2020.03.002
XXX. Qaisar, Saeed Mian, et al. “Multirate processing with selective subbands and machine learning for efficient arrhythmia classification.” Sensors 21.4 (2021): 1511. 10.3390/s21041511
XXXI. Rhys, Hefin. Machine Learning with R, the tidyverse, and mlr. Simon and Schuster, 2020. https://www.manning.com/books/machine-learning-with-r-the-tidyverse-and-mlr
XXXII. Shouman, Mai, Tim Turner, and Rob Stocker. “Using Decision Tree for Diagnosing Heart Disease Patients.” AusDM 11 (2011): 23-30. https://crpit.scem.westernsydney.edu.au/confpapers/CRPITV121Shouman.pdf
XXXIII. Shrivastav, Lokesh Kumar, and Ravinder Kumar.”An ensemble of random forest gradient boosting machine and deep learning methods for stock price prediction.” Journal of Information Technology Research (JITR) 15.1 (2022): 1-19. 10.4018/JITR.2022010102
XXXIV. Srinivasan, Sriram, et al. “Deep convolutional neural network based image spam classification.” 2020 6th conference on data science and machine learning applications (CDMA). IEEE, 2020. 10.1109/CDMA47397.2020.00025
XXXV. Steyerberg, Ewout W., et al. “Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.” Journal of clinical epidemiology 54.8 (2001): 774-781. 10.1016/S0895-4356(01)00341-9
XXXVI. Tiwari, Akshita, and Srabanti Maji. “Machine learning techniques for tuberculosis prediction.” International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India. 2019. 10.2139/ssrn.3404486
XXXVII. Tuberculosis (TB). https://www.who.int/news-room/fact-sheets/detail/tuberculosis
XXXVIII. Tuberculosis Research Centre (Indian Council of Medical Research), Chennai, India. “Split‐drug regimens for the treatment of patients with sputum smear‐positive pulmonary tuberculosis–a unique approach.” Tropical Medicine & International Health 9.5 (2004): 551-558. 10.1111/j.1365-3156.2004.01229.x
XXXIX. Veropoulos, Konstantinos, Colin Campbell, and Nello Cristianini. “Controlling the sensitivity of support vector machines.” Proceedings of the international joint conference on AI. Vol. 55. 1999. https://seis.bristol.ac.uk/~enicgc/pubs/1999/ijcai_ss.pdf
XL. What Is Machine Learning in Healthcare? Applications and Opportunities. Coursera. https://www.coursera.org/in/articles/machine-learning-in-health-care
XLI. Wilson, Robert A., and Frank C. Keil, eds. The MIT Encyclopedia of the cognitive sciences (MITECS). MIT press, 2001. 10.7551/mitpress/4660.001.0001
XLII. Yang, Kaixiang, et al. “Hybrid classifier ensemble for imbalanced data.” IEEE transactions on neural networks and learning systems 31.4 (2019): 1387-1400. 10.1109/TNNLS.2019.2920246
XLIII. Zhang, Yuzhen, Jingjing Liu, and Wenjuan Shen. “A review of ensemble learning algorithms used in remote sensing applications.” Applied Sciences 12.17 (2022): 8654. 10.3390/app12178654