Enhancing data analysis through k-means with foggy centroid selection
Arun Sharma, Surendra Vishwakarma and Animesh Kumar Dubey
Abstract
An innovative approach, k-means with foggy centroid selection (KFCS) was proposed, for enhancing data clustering performance. This study focuses on the application of this method to the Pima Indians diabetes database, serving as a comprehensive evaluation ground. The process begins with preprocessing and data arrangement, involving scaling and normalization to ensure accurate computation. KFCS, combines k-means clustering with foggy centroid selection, utilizing both random initialization and iterative centroid calculation. The approach hinges on four distance algorithms – Euclidean, Pearson Coefficient, Chebyshev, and Canberra – to gauge similarity. A detailed exploration of distance estimation enhances dataset understanding. Through rigorous evaluation, KFCS demonstrates superiority in terms of computation time and error analysis, with Canberra algorithm emerging as a standout performer. This work contributes a comprehensive methodology for improved data clustering and analysis.
Keyword
K-means, Euclidean, Pearson coefficient, Chebyshev and Canberra.
Cite this article
Sharma A, Vishwakarma S, Dubey AK.Enhancing data analysis through k-means with foggy centroid selection. International Journal of Advanced Computer Research. 2023;13(64):55-61. DOI:10.19101/IJACR.2023.1362018
Refference
[1]Dubey AK, Gupta U, Jain S. Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. International Journal of Computer Assisted Radiology and Surgery. 2016; 11:2033-47.
[2]Dubey AK, Dubey AK, Agarwal V, Khandagre Y. Knowledge discovery with a subset-superset approach for mining heterogeneous data with dynamic support. In CSI sixth international conference on software engineering (CONSEG) 2012 (pp. 1-6). IEEE.
[3]Jiang R, Han S, Yu Y, Ding W. An access control model for medical big data based on clustering and risk. Information Sciences. 2023; 621:691-707.
[4]Alizadehsani R, Roshanzamir M, Izadi NH, Gravina R, Kabir HD, Nahavandi D, et al. Swarm intelligence in internet of medical things: a review. Sensors. 2023; 23(3):1466.
[5]Dubey AK, Gupta U, Jain S. Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data. International Journal on Advanced Science, Engineering and Information Technology. 2018; 8(1):18-29.
[6]Fernández-de-Las-Peñas C, Martín-Guerrero JD, Florencio LL, Navarro-Pardo E, Rodríguez-Jiménez J, Torres-Macho J, et al. Clustering analysis reveals different profiles associating long-term post-COVID symptoms, COVID-19 symptoms at hospital admission and previous medical co-morbidities in previously hospitalized COVID-19 survivors. Infection. 2023; 51(1):61-9.
[7]Sangaiah AK, Rezaei S, Javadpour A, Zhang W. Explainable AI in big data intelligence of community detection for digitalization e-healthcare services. Applied Soft Computing. 2023; 136:110119.
[8]Liu B, Li X, Wang H, Zhao S, Li J, Qu G, Wang F. Retyping of triple‐negative breast cancer based on clustering method. Expert Systems. 2023; 40(2):e12583.
[9]Setiawan KE, Kurniawan A, Chowanda A, Suhartono D. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means. Procedia Computer Science. 2023; 216:356-63.
[10]Das RK, Shandilya M. Clustering based ACO and ABC algorithms for the shadow detection and removal. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(91):839-53.
[11]Rao NT, Satyanarayana KV, Satyanarayana M, Joshua ES, Bhattacharyya D. Breast cancer classification using improved fuzzy C-Means algorithm. In smart technologies in data science and communication: proceedings of SMART-DSC 2022 2023 (pp. 197-204). Singapore: Springer Nature Singapore.
[12]Dubey AK, Sinhal AK, Sharma R. An improved auto categorical PSO with ML for heart disease prediction. Engineering, Technology & Applied Science Research. 2022; 12(3):8567-73.
[13]Kumar M, Dubey AK. An analysis and literature review of algorithms for frequent itemset mining. International Journal of Advanced Computer Research. 2023; 13(62):1-7.
[14]Ilango SS, Vimal S, Kaliappan M, Subbulakshmi P. Optimization using artificial bee colony based clustering approach for big data. Cluster Computing. 2019; 22:12169-77.
[15]Ladha GG, Pippal RK. An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(73):234-40.
[16]Dubey A, Gupta U, Jain S. Medical data clustering and classification using TLBO and machine learning algorithms. Computers, Materials and Continua. 2021; 70(3):4523-43.
[17]Dubey AK, Shandilya SK. A comprehensive survey of grid computing mechanism in J2ME for effective mobile computing techniques. In 5th international conference on industrial and information systems 2010 (pp. 207-12). IEEE.
[18]Muqtadiroh FA, Usagawa T, Rachmayanti RD, Nugroho SM, Yuniarno EM, Purnomo MH. Rules determination based on time-series data to classify unsupervised cases based on fuzzy expert system. International Journal of Intelligent Engineering and Systems. 2023; 16(3):258-68.
[19]Liu J, Peng B, Yin Z. A hybrid machine learning method for diabetes detection based on unsupervised clustering. In proceedings of the 2023 7th international conference on machine learning and soft computing 2023 (pp. 144-9).
[20]Vatresia A, Johar A. Fuzzy subtractive C-means for teacher distribution analysis. In mathematics and science education international seminar 2021 (MASEIS 2021) 2023 (pp. 233-44). Atlantis Press.
[21]Reddy BR, Kumar YV, Prabhakar M. Clustering large amounts of healthcare datasets using fuzzy c-means algorithm. In 5th international conference on advanced computing & communication systems 2019 (pp. 93-97). IEEE.
[22]Vanitha CN, Archana N, Sowmiya R. Agriculture analysis using data mining and machine learning techniques. In 5th international conference on advanced computing & communication systems 2019 (pp. 984-90). IEEE.
[23]Dai H, Sheng W. A multi-objective clustering ensemble algorithm with automatic k-determination. In 4th international conference on cloud computing and big data analysis 2019 (pp. 333-7). IEEE.
[24]Anishfathima B, Gautham P, Mahalakshmi BG, Jamadar SJ. Smart architecture for diabetic patients using machine learning. In 7th international conference on advanced computing and communication systems 2021 (pp. 1544-8). IEEE.
[25]Salsabila SS, Kristalina P, Santoso T. The implementation of optimal k-means clustering for indoor moving object localization. In international electronics symposium 2022 (pp. 210-5). IEEE.
[26]DiAdamo S, O’Meara C, Cortiana G, Bernabé-Moreno J. Practical quantum K-Means clustering: performance analysis and applications in energy grid classification. IEEE Transactions on Quantum Engineering. 2022; 3:1-6.
[27]Jeyachidra J, Logesh T, Nandhini K, Krithiga R. Hybrid K-Means clustering for training special children using utility pattern mining. In international conference on artificial intelligence and knowledge discovery in concurrent engineering 2023 (pp. 1-7). IEEE.
[28]Chusyairi A, Nurdiawan O, Sambath K, Hayat RN, Wijaya YA. Hepatitis cluster model with K-means algorithm. In international conference on computer science, information technology and engineering 2023 (pp. 811-5). IEEE.
[29]Siridhara AL, Manikanta KV, Yadav D, Varun P, Saragada J. Defect detection in fruits and vegetables using K Means segmentation and Otsu’s thresholding. In international conference on networking and communications 2023 (pp. 1-5). IEEE.
[30]Hou Y, Lu H, Cao N, Wei Z. User behavior analysis of substation area based on improved K-means quadratic clustering algorithm. In 5th international conference on intelligent control, measurement and signal processing 2023 (pp. 1223-7). IEEE.