ACCENTS Journals

Download PDF
Back

Paper Title	:	Enhancing data analysis through k-means with foggy centroid selection
Author Name	:	Arun Sharma, Surendra Vishwakarma and Animesh Kumar Dubey
Abstract	:	An innovative approach, k-means with foggy centroid selection (KFCS) was proposed, for enhancing data clustering performance. This study focuses on the application of this method to the Pima Indians diabetes database, serving as a comprehensive evaluation ground. The process begins with preprocessing and data arrangement, involving scaling and normalization to ensure accurate computation. KFCS, combines k-means clustering with foggy centroid selection, utilizing both random initialization and iterative centroid calculation. The approach hinges on four distance algorithms – Euclidean, Pearson Coefficient, Chebyshev, and Canberra – to gauge similarity. A detailed exploration of distance estimation enhances dataset understanding. Through rigorous evaluation, KFCS demonstrates superiority in terms of computation time and error analysis, with Canberra algorithm emerging as a standout performer. This work contributes a comprehensive methodology for improved data clustering and analysis.
Keywords	:	K-means, Euclidean, Pearson coefficient, Chebyshev and Canberra.
Cite this article	:	Sharma A, Vishwakarma S, Dubey AK.Enhancing data analysis through k-means with foggy centroid selection. International Journal of Advanced Computer Research. 2023;13(64):55-61. DOI:10.19101/IJACR.2023.1362018
References	:	[1]Dubey AK, Gupta U, Jain S. Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. International Journal of Computer Assisted Radiology and Surgery. 2016; 11:2033-47. [Crossref] [Google Scholar] [2]Dubey AK, Dubey AK, Agarwal V, Khandagre Y. Knowledge discovery with a subset-superset approach for mining heterogeneous data with dynamic support. In CSI sixth international conference on software engineering (CONSEG) 2012 (pp. 1-6). IEEE. [Google Scholar] [3]Jiang R, Han S, Yu Y, Ding W. An access control model for medical big data based on clustering and risk. Information Sciences. 2023; 621:691-707. [Crossref] [Google Scholar] [4]Alizadehsani R, Roshanzamir M, Izadi NH, Gravina R, Kabir HD, Nahavandi D, et al. Swarm intelligence in internet of medical things: a review. Sensors. 2023; 23(3):1466. [Crossref] [Google Scholar] [5]Dubey AK, Gupta U, Jain S. Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data. International Journal on Advanced Science, Engineering and Information Technology. 2018; 8(1):18-29. [Google Scholar] [6]Fernández-de-Las-Peñas C, Martín-Guerrero JD, Florencio LL, Navarro-Pardo E, Rodríguez-Jiménez J, Torres-Macho J, et al. Clustering analysis reveals different profiles associating long-term post-COVID symptoms, COVID-19 symptoms at hospital admission and previous medical co-morbidities in previously hospitalized COVID-19 survivors. Infection. 2023; 51(1):61-9. [Crossref] [Google Scholar] [7]Sangaiah AK, Rezaei S, Javadpour A, Zhang W. Explainable AI in big data intelligence of community detection for digitalization e-healthcare services. Applied Soft Computing. 2023; 136:110119. [Crossref] [Google Scholar] [8]Liu B, Li X, Wang H, Zhao S, Li J, Qu G, Wang F. Retyping of triple‐negative breast cancer based on clustering method. Expert Systems. 2023; 40(2):e12583. [Crossref] [Google Scholar] [9]Setiawan KE, Kurniawan A, Chowanda A, Suhartono D. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means. Procedia Computer Science. 2023; 216:356-63. [Crossref] [Google Scholar] [10]Das RK, Shandilya M. Clustering based ACO and ABC algorithms for the shadow detection and removal. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(91):839-53. [Crossref] [Google Scholar] [11]Rao NT, Satyanarayana KV, Satyanarayana M, Joshua ES, Bhattacharyya D. Breast cancer classification using improved fuzzy C-Means algorithm. In smart technologies in data science and communication: proceedings of SMART-DSC 2022 2023 (pp. 197-204). Singapore: Springer Nature Singapore. [Crossref] [Google Scholar] [12]Dubey AK, Sinhal AK, Sharma R. An improved auto categorical PSO with ML for heart disease prediction. Engineering, Technology & Applied Science Research. 2022; 12(3):8567-73. [Crossref] [Google Scholar] [13]Kumar M, Dubey AK. An analysis and literature review of algorithms for frequent itemset mining. International Journal of Advanced Computer Research. 2023; 13(62):1-7. [Crossref] [Google Scholar] [14]Ilango SS, Vimal S, Kaliappan M, Subbulakshmi P. Optimization using artificial bee colony based clustering approach for big data. Cluster Computing. 2019; 22:12169-77. [Crossref] [Google Scholar] [15]Ladha GG, Pippal RK. An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(73):234-40. [Crossref] [Google Scholar] [16]Dubey A, Gupta U, Jain S. Medical data clustering and classification using TLBO and machine learning algorithms. Computers, Materials and Continua. 2021; 70(3):4523-43. [Crossref] [Google Scholar] [17]Dubey AK, Shandilya SK. A comprehensive survey of grid computing mechanism in J2ME for effective mobile computing techniques. In 5th international conference on industrial and information systems 2010 (pp. 207-12). IEEE. [Crossref] [Google Scholar] [18]Muqtadiroh FA, Usagawa T, Rachmayanti RD, Nugroho SM, Yuniarno EM, Purnomo MH. Rules determination based on time-series data to classify unsupervised cases based on fuzzy expert system. International Journal of Intelligent Engineering and Systems. 2023; 16(3):258-68. [Crossref] [Google Scholar] [19]Liu J, Peng B, Yin Z. A hybrid machine learning method for diabetes detection based on unsupervised clustering. In proceedings of the 2023 7th international conference on machine learning and soft computing 2023 (pp. 144-9). [Crossref] [Google Scholar] [20]Vatresia A, Johar A. Fuzzy subtractive C-means for teacher distribution analysis. In mathematics and science education international seminar 2021 (MASEIS 2021) 2023 (pp. 233-44). Atlantis Press. [Crossref] [Google Scholar] [21]Reddy BR, Kumar YV, Prabhakar M. Clustering large amounts of healthcare datasets using fuzzy c-means algorithm. In 5th international conference on advanced computing & communication systems 2019 (pp. 93-97). IEEE. [Crossref] [Google Scholar] [22]Vanitha CN, Archana N, Sowmiya R. Agriculture analysis using data mining and machine learning techniques. In 5th international conference on advanced computing & communication systems 2019 (pp. 984-90). IEEE. [Crossref] [Google Scholar] [23]Dai H, Sheng W. A multi-objective clustering ensemble algorithm with automatic k-determination. In 4th international conference on cloud computing and big data analysis 2019 (pp. 333-7). IEEE. [Crossref] [Google Scholar] [24]Anishfathima B, Gautham P, Mahalakshmi BG, Jamadar SJ. Smart architecture for diabetic patients using machine learning. In 7th international conference on advanced computing and communication systems 2021 (pp. 1544-8). IEEE. [Crossref] [Google Scholar] [25]Salsabila SS, Kristalina P, Santoso T. The implementation of optimal k-means clustering for indoor moving object localization. In international electronics symposium 2022 (pp. 210-5). IEEE. [Crossref] [Google Scholar] [26]DiAdamo S, O’Meara C, Cortiana G, Bernabé-Moreno J. Practical quantum K-Means clustering: performance analysis and applications in energy grid classification. IEEE Transactions on Quantum Engineering. 2022; 3:1-6. [Crossref] [Google Scholar] [27]Jeyachidra J, Logesh T, Nandhini K, Krithiga R. Hybrid K-Means clustering for training special children using utility pattern mining. In international conference on artificial intelligence and knowledge discovery in concurrent engineering 2023 (pp. 1-7). IEEE. [Crossref] [Google Scholar] [28]Chusyairi A, Nurdiawan O, Sambath K, Hayat RN, Wijaya YA. Hepatitis cluster model with K-means algorithm. In international conference on computer science, information technology and engineering 2023 (pp. 811-5). IEEE. [Crossref] [Google Scholar] [29]Siridhara AL, Manikanta KV, Yadav D, Varun P, Saragada J. Defect detection in fruits and vegetables using K Means segmentation and Otsu’s thresholding. In international conference on networking and communications 2023 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [30]Hou Y, Lu H, Cao N, Wei Z. User behavior analysis of substation area based on improved K-means quadratic clustering algorithm. In 5th international conference on intelligent control, measurement and signal processing 2023 (pp. 1223-7). IEEE. [Crossref] [Google Scholar]