ANALYTICAL ASSESSMENT OF NOUN VERB TERM EXTRACTION FOR DOCUMENT CLASSIFICATION USING T-TEST
Authors:
Omaia Mohammad Al-Omari, Nazlia OmarDOI NO:
https://doi.org/10.26782/jmcms.2020.03.00021Abstract:
There has been a significant growth in the digital word as per the documents are concerned. The classification of digital document is a big trend in the market as a revolution. However the classification of the document is a big task for the modern applications. There are various terms that are used for the extraction of information from the documents. The main concerned areas for the document classification are the noun and the verbs that broadly signify the topics and events. The use of NV (Noun Verb) techniques is a common and powerful practice for the words to be classified. The performance of the document depends on the NV technique due to the classification of the document. The main aim of the work shown in this study is to enhance the capability of the NV extraction methodology to classify the documents. Three classifiers namely, K-Nearest Neighbor (KNN), Naive Bayes (NB), and Support Vector Machine (SVM) are used for the comparison of the results. Various benchmark set are used in this study for the evaluation of the accuracy of the data sets. The data sets were taken from Reuters 8 and WebKb for this purpose. Other extraction methods were also enhanced and incorporated with the NV method extraction e.g., Nouns, Bag of Word (BOW), and Verbs. The results are studied and the conclusion follows themKeywords:
BOW extraction,Document classification,NV extraction,KNN classifier,NB classifier,SVM classifier,Refference:
I. Apoorva Deshpande, Ramnaresh Sharma, Multilevel Ensemble Classifier using Normalized Feature based Intrusion Detection System, International Journal of Advance Trends in Computer Science and Engineering, Vol 7, No.5, September -October 2018.
II. Bsoul, Q., &Salim, Z. 2016. Effect Verb Extraction on Crime Traditional Cluster, world applied science journal.
III. Cambria, E., & White, b. 2014. Jumping NLP Curves: A Review of Natural Language Processing Research. IEEE Computational Intelligence Magazine, 9(1): 48-57.
IV. Ding, X. & Tang, Y. 2013. Improved Mutual Information Method For Text Feature Selection. The 8th International Conference on Computer Science & Education. IEEE, pp: 163-166.
V. Dyer, M. 1995. Connectionist natural language processing: a status report. in Computational Architectures Integrating Neural and Symbolic Processes, Sun and L. Bookman, Eds. Dordrecht. The Netherlands: Kluwer Academic, 292(1):389–429.
VI. Fodeh, S., Punch, W. & Tan, P. 2011. On ontology-driven document clustering using core semantic features. On ontology-driven document clustering using core semantic features, Journal of KnowlInfSyst, Springer-Verlag London. 28(2): 395-421.
VII. Guru, S., Suhil, M., Raju, N., & Kumar, V., An Alternative Framework for Univariate Filter based Feature Selection for Text Categorization. Pattern Recognition Letters. 2018. https://doi.org/10.1016/j.patrec.2017.12.025
VIII. Hotho, A., Staab, S., &Stumme, G. 2003. WordNet improves text document clustering. In Proc. of the SIGIR 2003 Semantic Web Workshop, pp: 541-544.
IX. International Journal of Advanced Trends in Computer Science and Engineering, Volume 8, No.1, January – February 2019. Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse15812019.pdf https://doi.org/10.30534/ijatcse/2019/15812019
X. Kummer, O., Savoy, J., &Argand, E. 2012. Feature selection in sentiment analysis.
XI. Lewis, D. 1997. Reuters-21578 text categorization test collection. AT&T Labs Research.
XII. Liu, Xin&Beyrend-Dur, Delphine&Dur, Gael & Ban, Syuhei. (2014).
XIII. OumaymaOueslati, Ahmed Ibrahim S. Khalil, Habib Ounelli, Sentiment Analysis for Helpful Reviews Prediction, International Journal of Advance Trends in Computer Science and Engineering, Vol 7, No.3, May – June 2018
XIV. Porter, F. 1997. An algorithm for suffix stripping in K. Sparck Jones, P. Willett (1st Eds) Readings in Information Retrieval, Morgan Kaufmann Multimedia Information and Systems Series, pp: 313–316.
XV. Rogati, Monica & Yang, Yiming. 2002. High-performing feature selection for text classification. 659. 10.1145/584902.584911.
XVI. Yao, H., Liu, C., Zhang, P., & Wang, L. 2017. A feature selection method based on synonym merging in text classification system. Journal on Wireless Communications and Networking. Springer. pp: 1-8.