Improved and Effective Artificial Bee Colony Clustering Algorithm for Social Media Data (I-ABC)

Authors:

Akash Shrivastava,Dr. M. L. Garg,

DOI NO:

https://doi.org/10.26782/jmcms.2019.02.00020

Keywords:

Big data,Twitter,Clustering,Big data Analysis,Artificial Bee Colony(ABC), Data classification,

Abstract

Social media data made real world like a web of data which is highly categorical in nature. Data having categorical attributes are omnipresent in existing real world. Clustering is an effective approach to deal with categorical data. However, partitional clustering algorithms are prone to fall into local optima for categorical data. A novel approach of ABC K-modes has been proposed to address this issue but acceleration issue of this algorithm was still a challenge for it. In this paper, we address this challenge to reduce the acceleration factor of algorithm and proposing a novel modified ABC K-modes approach which we refer as N-ABC K-modes approach. In our approach, unlike existing ABC K-modes we introduces different attribute matrix for each data sets. In further step, we apply XOR operation to combine the matrix of similar attributes. In last phase, dissimilar data would form a cluster and we apply clustering follow by searching on this cluster. The performance of New ABC K-modes evaluated by a series of tests and experiments over real time streaming social media data like twitter and facebook in comparison with that of other popular algorithms for categorical data.

Refference:

I.Arthur, D., &Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In N. Bansal, K.Pruhs, & C. Stein (Eds.), Proc. of the eighteenth anual ACMSIAM symposium on discrete algorithms, SODA (pp. 1027–1035).

II.Han J, Kamber M, Pei J. Data mining concepts and techniques. 3rd ed. Waltham: Morgan Kaufmann; 2012.

III.Handl, J., Knowles, J., &Dorigo, M. (2006). Ant-based clustering and topographic mapping. Artificial Life, 12(1), 35–62.

IV.Hruschka, E., Campello, R., & de Castro, L. (2006). Evolving clusters in gene-expression data. Information Sciences, 176(13), 1898–1927.

V.Huang Z. Clustering large data sets with mixed numeric and categorical values. In the first Pacific-Asia Conference on Knowledge Discovery and Data Mining. 1997; pp. 21–34.

VI.Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery. 1998; 2: 283–304.

VII.Ikura Y, Gimple M. Efficient scheduling algorithms for a single batch processing machine. Operations Research Letters. 1986; 5: 61–65.

VIII.Ji J, Pang W, Zheng Y, Wang Z, Ma Z (2015) A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data. PLoS ONE 10(5): e0127125. doi:10.1371/journal.pone.0127125.

IX.Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2004). A local search approximation algorithm for k-meansclustering. Computational Geometry, 28(2–3), 89–112.

X.Kao Y-T, Zahara E, Kao IW. A hybridized approach to data clustering. Expert Systems with Applications.2008; 34: 1754–1762.

XI.Karaboga D, Basturk B. On the performance of artificial bee colony (ABC) algorithm. Applied Soft Computing.2008; 8: 687–697.XII.Karaboga D, Ozturk C. A novel clustering approach: artificial bee colony (ABC) algorithm. Applied Soft Computing. 2011; 11: 652–657.

XIII.Li, L., Yang, Y., Peng, H., & Wang, X. (2006). An optimization method inspired by chaotic ant behavior. International Journal of Bifurcation and Chaos, 16, 2351–2364.

XIV.Luo C, Pang W, Wang Z (2014) Semi-Supervised clustering on heterogeneous information networks. In: Proceedings of 18th Pacific Asia Conference of Knowledge Discovery and Data Mining (PAKDD’14). Taiwan, pp 548-559.

XV.MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297).

XVI.Rand WM. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 1971; 66: 846–850.

XVII.Shamanth Kumar, Fred Morastatter, Huan Liu, Twitter Data analytics, Springer, Aug 19,2013.

XVIII.Shelokar PS, Jayaraman VK, Kulkarni BD. An ant colony approach for clustering. AnalyticaChimicaActa. 2004; 509: 187–195.

XIX.Teodorović D. Bee Colony Optimization (BCO). In: Lim C, Jain L, Dehuri S, editors. Innovations in Swarm Intelligence. Berlin: Springer-Verlag; 2009. pp. 39–60.

XX.Van der Merwe, D. W., &Engelbrecht, A. P. (2003). Data clustering using particle swarm optimization. In Proceedings of IEEE congress on evolutionary computation (pp. 215–220).

XXI.Wan M, Li L, Xiao J, Wang C, Yang Y. Data clustering using bacterial foraging optimization. Journal of Intelligent Information Systems. 2012; 38: 321–341.

XXII.Wan, M., Li, L., Xiao, J., Yang, Y., Wang, C., &Guo, X. (2010). CAS based clustering algorithm for web users. Nonlinear Dynamics, 61(3), 347–361.

XXIII.Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval. 1999; 1: 67–88.

XXIV.Zhang C, Ouyang D, Ning J. An artificial bee colony approach for clustering. Expert Systems with Applications.2010; 37: 4761–4767.

Akash Shrivastava, Dr. M. L. Garg View Download