ACCENTS Transactions on Information Security (TIS) ISSN (P): 12222 ISSN (O): 2455-7196 Vol - 8, Issue - 32, October 2023
  1. 0
    Google Scholar
  2. 0
    Citation
  3. 0
    Impact Factor
Advancements in big data clustering: methods, applications, and insights

Chandan Kumar Soni and Mohan Kumar Patel

Abstract

The digital age has given rise to an unprecedented influx of data, marking the era of big data. In this landscape, clustering has emerged as a critical element of data analysis, enabling the discovery of latent patterns in vast datasets. This review paper explores the state-of-the-art in big data clustering, encompassing influential research, methodologies, advantages, and limitations. The paper highlights the significant advantages brought by different clustering algorithms, spanning domains from smart grids and education to e-commerce and different operations. However, it also acknowledges limitations such as scalability issues and generalization challenges, underlining the importance of addressing these constraints for future research.

Keyword

Big data clustering, Data mining, Unsupervised learning, Clustering algorithms.

Cite this article

Soni CK, Patel MK

Refference

[1][1]Mussabayev R, Mladenovic N, Jarboui B, Mussabayev R. How to use K-means for big data clustering?. Pattern Recognition. 2023; 137:109269.

[2][2]Hu H, Liu J, Zhang X, Fang M. An effective and adaptable k-means algorithm for big data cluster analysis. Pattern Recognition. 2023; 139:109404.

[3][3]Pina AF, Meneses MJ, Sousa‐Lima I, Henriques R, Raposo JF, Macedo MP. Big data and machine learning to tackle diabetes management. European Journal of Clinical Investigation. 2023; 53(1):e13890.

[4][4]Alghamdi A. A hybrid method for big data analysis using fuzzy clustering, feature selection and adaptive neuro-fuzzy inferences system techniques: case of Mecca and Medina hotels in Saudi Arabia. Arabian Journal for Science and Engineering. 2023 ; 48(2):1693-714.

[5][5]Belle A, Thiagarajan R, Soroushmehr SM, Navidi F, Beard DA, Najarian K. Big data analytics in healthcare. BioMed Research International. 2015; 2015.

[6][6]Dubey A, Gupta U, Jain S. Medical data clustering and classification using TLBO and machine learning algorithms. Computers, Materials and Continua. 2021; 70(3):4523-43.

[7][7]Jahani H, Jain R, Ivanov D. Data science and big data analytics: a systematic review of methodologies used in the supply chain and logistics research. Annals of Operations Research. 2023:1-58.

[8][8]Pandey KK, Shukla D. Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering. Evolutionary Intelligence. 2023; 16(3):1055-76.

[9][9]Li J, Herdem MS, Nathwani J, Wen JZ. Methods and applications for artificial intelligence, big data, internet of things, and blockchain in smart energy management. Energy and AI. 2023; 11:100208.

[10][10]Dubey AK, Kushwaha GR, Shrivastava N. Heterogeneous data mining environment based on dam for mobile computing environments. In international conference on advances in information technology and mobile communication 2011 (pp. 144-9). Berlin, Heidelberg: Springer Berlin Heidelberg.

[11][11]Hussin SK, Omar YM, Abdelmageid SM, Marie MI. Traditional machine learning and big data analytics in virtual screening: a comparative study. International Journal of Advanced Computer Research. 2020; 10(47):72-88.

[12][12]El Hilali W, El Manouar A, Idrissi MA. The mediating role of big data analytics in enhancing firms’ commitment to sustainability. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(80):932-44.

[13][13]He W, Hung JL, Liu L. Impact of big data analytics on banking: a case study. Journal of Enterprise Information Management. 2023; 36(2):459-79.

[14][14]Izhar A, Rastogi A, Ali SS, Quadri SM, Rizvi SA. Feature-driven label generation for congestion detection in smart cities under big data. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(86):94-110.

[15][15]Dubey AK, Shandilya SK. A comprehensive survey of grid computing mechanism in J2ME for effective mobile computing techniques. In 5th international conference on industrial and information systems 2010 (pp. 207-12). IEEE.

[16][16]Guan S, Zhang C, Wang Y, Liu W. Hadoop-based secure storage solution for big data in cloud computing environment. Digital Communications and Networks. 2023.

[17][17]Rani P, Lamba R, Sachdeva RK, Kumar R, Bathla P. Big data analytics: integrating machine learning with big data using hadoop and mahout. Intelligent Systems and Smart Infrastructure: Proceedings of ICISSI 2022. 2023:366.

[18][18]Al-Jumaili AH, Muniyandi RC, Hasan MK, Paw JK, Singh MJ. Big data analytics using cloud computing based frameworks for power management systems: status, constraints, and future recommendations. Sensors. 2023; 23(6):2952.

[19][19]Dubey AK, Shandilya SK. A novel J2ME service for mining incremental patterns in mobile computing. In information and communication technologies: international conference, ICT 2010, Kochi, Kerala, India, (pp. 157-64). Springer Berlin Heidelberg.

[20][20]Fan L. Research on precision marketing strategy of commercial consumer products based on big data mining of customer consumption. Journal of the Institution of Engineers (India): Series C. 2023; 104(1):163-8.

[21][21]Marichamy VS, Natarajan V. Blockchain based securing medical records in big data analytics. Data & Knowledge Engineering. 2023; 144:102122.

[22][22]Du X, He Y, Huang JZ. Random sample partition-based clustering ensemble algorithm for big data. In international conference on big data (Big Data) 2021 (pp. 5885-7). IEEE.

[23][23]Li C, Yang B, Chen X, Zhang E, Huang H, Li D. Research on smart grid big data’s curve mean clustering algorithm for edge-cloud collaborative application. In international conference on wireless communications and smart grid (ICWCSG) 2021 (pp. 395-8). IEEE.

[24][24]Wang CL. Research on the core technology of education big data based on data mining. In 6th international conference on big data analytics (ICBDA) 2021 (pp. 5-8). IEEE.

[25][25]Shanshan F, Zhiqiang R. Analysis of big data complex network structure based on fuzzy clustering algorithm. In international conference on networking, communications and information technology (NetCIT) 2021 (pp. 348-52). IEEE.

[26][26]Shi Z, Zhang K, Liu B, Zhao Y, Zhang J, Li Z. Classification of e-commerce big data based on iterative fuzzy clustering algorithm. In international conference on intelligent transportation, big data & smart city (ICITBS) 2022 (pp. 78-81). IEEE.

[27][27]Deng J, Hu J. An investigation into big data of emergency rescue based on an improved DDRfs. In 4th international conference on machine learning, big data and business intelligence (MLBDBI) 2022 (pp. 52-6). IEEE.

[28][28]Xing W, Wu B, Liang M, Li Y, Cheng L. Research on error calibration method for power big data based on k-means clustering. In 9th international forum on electrical engineering and automation (IFEEA) 2022 (pp. 679-82). IEEE.

[29][29]Gupta A, Jain S. Optimizing performance of Real-Time Big Data stateful streaming applications on Cloud. In IEEE international conference on big data and smart computing (BigComp) 2022 (pp. 1-4). IEEE.

[30][30]Mahmud MS, Huang JZ, Ruby R, Ngueilbaye A, Wu K. Approximate clustering ensemble method for big data. IEEE Transactions on Big Data. 2023; 9(4): 1142-55.

[31][31]Wei C. Research on efficient parallelization of spectral clustering algorithm based on big data. In 2nd international conference on electrical engineering, big data and algorithms (EEBDA) 2023 (pp. 1912-6). IEEE.

[32][32]Wang C. Fault analysis and research on elevator clustering based on big data. In2023 4th international conference on big data, artificial intelligence and internet of things engineering (ICBAIE) 2023 (pp. 51-5). IEEE.