Similarity Based Feature Weighting for Inter Domain Classification of Text

Authors:

Brindha.G.R,Santhi.B,

DOI NO:

https://doi.org/10.26782/jmcms.2018.10.00014

Keywords:

Text processing,Feature weighting, Transductive Support Vector Machine,Cross domain classification,

Abstract

Intra domain supervised classification of online reviews is vastly analysed by current studies. At the same time, the level of performance declines when training is performed with one domain and testing with reviews of a different domain. The main fact behind this reduction is the domain distribution difference and the feature vector difference. Also the semantic of each word in a corpus differs based on its usage in domains. The objective of this study is to propose a new similarity based feature weighting technique for text reviews for enhancing the accuracy of inter domain classification. Different training and testing domains are weighted by proposed probability based statistical techniques for the classification by Support Vector Machine (SVM) and Transductive Support Vector Machine (TSVM). TSVM performs much better for this cross domain classification. The fact behind the performance of TSVM is its Transductive learning even with the small training set. The correlation between source and target domain and its influence on classification accuracy are analysed in detail using the outcome of existing feature weighting and proposed weighting techniques.

Refference:

I.Andreevskaia, A., & Bergler, S. (2008). When specialists and generalists work together: Overcoming domain dependence in sentiment tagging.Proceedings of ACL-08: HLT, 290-298.

II.Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. InProceedings of the 45th annual meeting of the association of computational linguistics(pp. 440-447).

III.Bollegala, D., Weir, D., & Carroll, J. (2011, June). Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1(pp. 132-141). Association for Computational Linguistics.

IV.Bollegala, D., Weir, D., & Carroll, J. (2013).Cross-domain sentiment classification using a sentiment sensitive thesaurus.IEEE transactions on knowledge and data engineering,25(8), 1719-1731.

V.Brindha, G. R., Swaminathan, P., & Santhi, B. (2016). Performance analysis of new word weighting procedures for opinion mining.Frontiers of Information Technology & Electronic Engineering,17(11), 1186-1198.

VI.Chenlo, J. M., Hogenboom, A., & Losada, D. E. (2014). Rhetorical structure theory for polarity estimation: An experimental study.Data & Knowledge Engineering,94, 135-147.

VII.Deng, Z. H., Luo, K. H., & Yu, H. L. (2014). A study of supervised term weighting scheme for sentiment analysis.Expert Systems with Applications,41(7), 3506-3513.

VIII.Gao, S., & Li, H. (2011, October). A cross-domain adaptation method for sentiment classification using probabilistic latent analysis. InProceedings of the 20th ACM international conference on Information and knowledge management(pp. 1047-1052). ACM.

IX.He, Y., Lin, C., & Alani, H. (2011, June). Automatically extracting polarity-bearing topics for cross-domain sentiment classification. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1(pp. 123-131). Association for Computational Linguistics.

X.Jiang, J., & Zhai, C. (2007, November). A two-stage approach to domain adaptation for statistical classifiers. InProceedings of the sixteenth ACM conference on Conference on information and knowledge management(pp. 401-410). ACM.

XI.Manning, C.,Raghavan, P.,andSchütze,H.(2008) Introduction to Information Retrieval, Cambridge University Press, ISBN:0521865719

XII.Pan, S. J., Ni, X., Sun, J. T., Yang, Q., & Chen, Z. (2010, April). Cross-domain sentiment classification via spectral feature alignment. InProceedings of the 19th international conference on World wide web(pp. 751-760). ACM.

XIII.Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. InProceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10(pp. 79-86). Association for Computational Linguistics.

XIV.Raaijmakers, S., & Kraaij, W. (2010, January). Classifier calibration for multi-domain sentiment classification. InICWSM.

XV.Tan, S., Wu, G., Tang, H., & Cheng, X. (2007, November). A novel scheme for domain-transfer problem in the context of sentiment analysis. InProceedings of the sixteenth ACM conference on Conference on information and knowledge management(pp. 979-982). ACM.

XVI.Van de Camp, M., & Van den Bosch, A. (2012). The socialist network.Decision Support Systems,53(4), 761-769.

XVII.Vapnik, V. (2013).The nature of statistical learning theory. Springer science & business media.

XVIII.Wang, B. K., Huang, Y. F., Yang, W. X., & Li, X. (2012). Short text classification based on strong feature thesaurus.Journal of Zhejiang University SCIENCE C,13(9), 649-659.

XIX.Wei, C. P., Lin, Y. T., & Yang, C. C. (2011). Cross-lingual text categorization: Conquering language boundaries in globalized environments.Information Processing & Management,47(5), 786-804.

XX.Wei, C. P., Yang, C. S., Lee, C. H., Shi, H., & Yang, C. C.(2014). Exploiting poly-lingual documents for improving text categorization effectiveness.Decision Support Systems,57, 64-76.

XXI.Wu, Q., Tan, S., & Cheng, X. (2009, August). Graph ranking for sentiment transfer. InProceedings of the ACL-IJCNLP 2009 Conference Short Papers(pp. 317-320). Association for Computational Linguistics.

Author(s): Brindha. G.R, Santhi. B, View Download