A COMPARATIVE APPROACH OFTEXT MINING: CLASSIFICATION, CLUSTERING ANDEXTRACTION TECHNIQUES

Authors:

Surya Bhupal Rao,S.Rahamat Basha,G Ravi Kumar,

DOI NO:

https://doi.org/10.26782/jmcms.spl.5/2020.01.00010

Keywords:

classification,clustering,Text mining,information retrieval,information extraction,

Abstract

The amount of text generated a day dramatically increases. Computers cannot easily process and perceive this enormous amount of mostly unstructured text. Therefore, to discover useful patterns, efficient and effective techniques and algorithms are required. Text mining is the process of extracting meaningful information from the text, which has received considerable attention in recent years. In this paper, we discuss several of the most basic tasks and techniques of text mining, including pre-processing, classification, and clustering. We also explain briefly text mining in the fields of biomedicine and health care.

Refference:

I. A. Kao, S. R. Poteet. “Natural language processing and text mining”,
Springer, 2007.
II. A. K. Uysal, S. Gunal, “The impact of preprocessing ontext classification”,
Information Processing & Management, Vol.: 50, Issue: 1, 104–112, 2014.
III. A. M. Callum, K. Nigam, “A comparison of event modelsfor naive bayes
text classification”, In AAAI-98 workshop on learning for text
categorization, Vol.: 752, pp. 41–48, 1998.
IV. C. C. Aggarwal, C. X. Zhai, “Mining text data”, Springer, 2012.
V. D. M. Bikel, S. Miller, R. Schwartz, R. Weischedel, “Nymble: a highperformance
learning name-finder”, In Proceedings of the fifth conference
on Applied natural language processing. Association for Computational
Linguistics, pp. 194–201, 1997.
VI. G. R. Kumar, G. A. Ramachandra, K. Nagamani, “An Efficient Prediction of
Breast Cancer Data using Data Mining Techniques”, International Journal of
innovations in Engineering and Technology, Vol.: 2, Issue: 4, pp: 139-144,
2013.
VII. G. R. Kumar, K. Nagamani, “A Framework of Dimensionality Reduction
utilizing PCA for Neural Network Prediction”, Proceedings of the
International Conference on Data Science and Management(ICDSM-2019),
Published in the book series Lecture Notes on Data Engineering and
Communications Technologies of Springer Publishing House, 2019.

VIII. G. R. Kumar, K. Nagamani, “Banknote Authentication System utilizing
Deep Neural Network with PCA and LDA Machine Learning Techniques”,
International Journal of Recent Scientific Research, Vol.: 9,Issue:12(D),
2018.
IX. J. Lafferty, A. McCallum, F. C. N. Pereira, “Conditionalrandom fields:
Probabilistic models for segmenting and labeling sequence data”, 2001.
X. K. Alsabti, S. Ranka, V. Singh, “An efficient k-means clustering algorithm”,
1997.
XI. K. Nigam, A. McCallum, S. Thrun, T. Mitchell, “Learning to classify text
from labeled and unlabeled documents”, AAAI/IAAI 792, 1998.
XII. L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, “Classification and
regression trees”, CRC press, 1984.
XIII. M. Allahyari, K. Kochut, “Automatic topic labeling using ontology based
topic models”, In Machine Learning and Applications (ICMLA), IEEE 14th
International Conference on. IEEE, pp. 259–264, 2015.
XIV. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez,
K.Kochut, “Text Summarization Techniques: A Brief Survey”, ArXiv eprints,
arXiv:1707.02268, 2017.
XV. M. V. Lakshmaiah, G. R. Kumar, G. Pakardin, “Frame work for Finding
Association Rules in Bid Data by using Hadoop Map/Reduce Tool”,
International Journal of Advance and Innovative Research, Vol.: 2, Issue:
1(I), pp:6-9, 2015.
XVI. S. R. Basha, J. K. Rani, “A Comparative Approach of Dimensionality
Reduction Techniques in Text Classification”, Engineering, Technology &
Applied Science Research, Vol.: 9, Issue: 6, pp. 4974-4979, 2019.
XVII. S. R. Basha, J. K. Rani, J. J. C. P. Yadav, “A Novel Summarization-based
Approach for Feature Reduction Enhancing Text Classification Accuracy”,
Engineering, Technology & Applied Science Research, Vol.: 9, Issue: 6, pp.
5001-5005, 2019.
XVIII. S. R. Basha, J. K. Rani, J. J. C. P. Yadav, G. R. Kumar, “Impact of
featureselection techniques in Text Classification:An Experimental study”,
J. Mech. Cont.& Math. Sci., Issue: 3, pp 39-51, 2019.
XIX. T. Kalt, W. B. Croft, “A new probabilistic model of text classification and
retrieval”, Technical Report, Citeseer, 1996.
XX. U. M. Fayyad, G. P. Shapiro, P. Smyth, “Knowledge Discovery and Data
Mining: Towards a Unifying Framework”, In KDD, Vol.: 96, pp. 82–88,
1996.

View | Download