International Journal of Advanced Technology and Engineering Exploration (IJATEE) ISSN (P): 2394-5443 ISSN (O): 2394-7454 Vol - 5, Issue - 45, August 2018
  1. 1
    Google Scholar
K-means and associated cuckoo based hierarchy optimization for document categorization

Chandni Sikarwar, Kailash Patidar and Rishi Kushwah

Abstract

In this paper a hybrid algorithm for text data categorization has been proposed. There are four different classifiers of filtrations have been used. First classifier is numeric classifier (NC). NC is used for the numeric value classification and removal. Second classifier is separator classifier (SC). SC is used for the delimiter value classification and removal. Third classifier is all classifier (AC). AC is used for the grammar classification and removal. Last classifier is manual data classifier (MDC). MDC is used for the manual data value classification and removal. Then the associated cuckoo search optimization (CSO) based hierarchy optimization has been applied on the obtained data. Overall accuracy obtained by our approach is approximately 95%.

Keyword

K-means, Associated cuckoo based hierarchy optimization, NC and SC, MDC.

Cite this article

Refference

[1][1]Azizah A, Abraham J. Content analysis and exploratory factor analysis of relationship goals among young adults: converging data from instagram and offline surveys. International Journal of Advanced Computer Research. 2017; 8(34):11-34.

[2][2]Rauber A, Dittenbach M, Merkl D. Towards automatic content-based organization of multilingual digital libraries: an English, French, and German view of the Russian information agency novosti news. In third all-Russian conference digital libraries: advanced methods and technologies 2001.

[3][3]Rauber A, Merkl D, Dittenbach M. The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks. 2002; 13(6):1331-41.

[4][4]Bagajewicz M, Cabrera E. Pareto optimal solutions visualization techniques for multiobjective design and upgrade of instrumentation networks. Industrial & Engineering Chemistry Research. 2003; 42(21):5195-203.

[5][5]Berger W, Piringer H, Filzmoser P, Groller E. Uncertainty-aware exploration of continuous parameter spaces using multivariate prediction. Computer Graphics Forum. 2011; 30(3):911-20.

[6][6]Beume N, Naujoks B, Emmerich M. SMS-EMOA: multiobjective selection based on dominated hypervolume. European Journal of Operational Research. 2007; 181(3):1653-69.

[7][7]Dubey AK, Shandilya SK. A novel J2ME service for mining incremental patterns in mobile computing. In international conference on advances in information and communication technologies 2010 (pp. 157-64). Springer, Berlin, Heidelberg.

[8][8]Shrivastava P, Gupta H. A review of density-based clustering in spatial data. International Journal of Advanced Computer Research. 2012; 2(5):200-2.

[9][9]Chen K, Liu L. A random rotation perturbation approach to privacy preserving data classification. In proceedings of international conference on data mining 2005 (pp. 1-12).

[10][10]Liang SC, Lee YC, Lee PC. The application of ant colony optimization to the classification rule problem. In international conference on granular computing 2011 (pp. 390-2). IEEE.

[11][11]Sadh AS, Shukla N. Association rules optimization: a survey. International Journal of Advanced Computer Research. 2013; 3(9):111-5.

[12][12]Modiri A, Kiasaleh K. Permittivity estimation for breast cancer detection using particle swarm optimization algorithm. In annual international conference of the engineering in medicine and biology society 2011 (pp. 1359-62). IEEE.

[13][13]Liu Y, Chung YY. Mining cancer data with discrete particle swarm optimization and rule pruning. In international symposium on IT in medicine and education 2011 (pp. 31-4). IEEE.

[14][14]Dubey AK, Dubey AK, Agarwal V, Khandagre Y. Knowledge discovery with a subset-superset approach for mining heterogeneous data with dynamic support. CONSEG-2012 (pp.1-6). IEEE.

[15][15]Srihari S, Leedham G. A survey of computer methods in forensic handwritten document examination. In eleventh international graphonomics society conference 2003 (pp. 278-81).

[16][16]Oppliger R, Rytz R. Digital evidence: dream and reality. IEEE Security & Privacy. 2003; 99(5):44-8.

[17][17]Mehrbod A, Zutshi A, Grilo A. A vector space model approach for searching and matching product e-catalogues. In proceedings of the eighth international conference on management science and engineering management 2014 (pp. 833-42). Springer, Berlin, Heidelberg.

[18][18]Bai VM, Manimegalai D. An analysis of document clustering algorithms. In international conference on communication control and computing technologies 2010 (pp. 402-6). IEEE.

[19][19]Zuhtuogullari K, Allahverdi N. An improved itemset generation approach for mining medical databases. International symposium on innovations in intelligent systems and applications 2011 (pp. 39-43). IEEE.

[20][20]Yang HC, Lee CH, Ke KL. TOSOM: a topic-oriented self-organizing map for text organization. International Journal of Computer and Information Engineering. 2010; 4(5):1013-7.

[21][21]Yang HC, Lee CH, Wu CY. Incorporating user constraints into topic-oriented self-organizing maps. In symposium on foundations of computational intelligence 2013 (pp. 91-7). IEEE.

[22][22]Pradip KG, Patil DR. Summarization of sentences using fuzzy and hierarchical clustering approach. In symposium on colossal data analysis and networking 2016 (pp. 1-7). IEEE.

[23][23]Harish BS, Revanasiddappa MB, Kumar SA. A modified support vector clustering method for document categorization. In international conference on knowledge engineering and applications 2016 (pp. 1-5). IEEE.

[24][24]Popat SK, Deshmukh PB, Metre VA. Hierarchical document clustering based on cosine similarity measure. In international conference on intelligent systems and information management 2017 (pp. 153-9). IEEE.

[25][25]Dou W, Liu S. Topic-and time-oriented visual text analysis. IEEE Computer Graphics and Applications. 2016; 36(4):8-13.

[26][26]Kohana M, Sakaji H, Kobayashi A, Okamoto S. A distributed calculation scheme for contents categorization. In international conference on advanced information networking and applications 2017 (pp. 614-20). IEEE.

[27][27]Nema P, Sharma V. Multi-label text categorization based on feature optimization using ant colony optimization and relevance clustering technique. In international conference on computers, communications, and systems 2015 (pp. 1-5). IEEE.

[28][28]Bide P, Shedge R. Improved document clustering using k-means algorithm. In international conference on electrical, computer and communication technologies 2015 (pp. 1-5). IEEE.

[29][29]Wandabwa H, Zhang D, Sammy K. Text categorization via attribute distance weighted k-nearest neighbor classification. In international conference on information technology 2016 (pp. 225-8). IEEE.