Subspace clustering for high dimensional datasets
G.N.V.G. Sirisha and M. Shashi
Abstract
Clustering high dimensional data is a challenging problem because of the existence of many irrelevant and redundant attributes. Conventional clustering algorithms identify a global set of relevant attributes prior to clustering using attribute selection and feature extraction techniques. All the globally relevant attributes are used in the similarity calculation while clustering. These algorithms fail to identify true clusters that are present in a subset of attributes. So, subspace clustering has become the thrust area of research in the recent past. Subspace clustering detects the clusters that exist in subsets of dimensions. Different types of subspace clustering algorithms are proposed in the literature. This paper discusses the different types of subspace clustering algorithms with main emphasis on 2D subspace clustering. Availability of new and huge datasets like spatiotemporal datasets, temporal datasets, spatial datasets and genomic data has necessitated the development of 3D subspace clustering. This paper presents an overview of subspace clustering for the research community who is interested in subspace clustering.
Keyword
Subspace clustering, Curse of dimensionality, Density divergence, 3D subspace clustering.
Cite this article
.Subspace clustering for high dimensional datasets. International Journal of Advanced Computer Research. 2016;6(26):177-184. DOI:10.19101/IJACR.2016.625012
Refference
[1]Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011.
[2]Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter. 2004; 6(1):90-105.
[3]Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery. 2013; 26(2):332-97.
[4]Sequeira K, Zaki M. SCHISM: a new approach to interesting subspace mining. International Journal of Business Intelligence and Data Mining. 2005; 1(2):137-60.
[5]Dharmavaram VG, Mogalla S. A framework for context-aware semi supervised learning. Global Journal of Computer Science and Technology. 2014; 14(1):61-70.
[6]Sirisha GNVG, Shashi M. Mining closed interesting subspaces to discover conducive living environment of migratory animals. In proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA) 2015 (pp. 153-66). Springer India.
[7]Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. ACM. 1998; 27(2):94-105.
[8]Agrawal R, Srikant R. Fast algorithms for mining association rules. In proceedings of 14th international conference on VLDB 1994 (pp. 487-99).
[9]Cheng CH, Fu AW, Zhang Y. Entropy-based subspace clustering for mining numerical data. In proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining 1999 (pp. 84-93). ACM.
[10]Goil S, Nagesh H, Choudhary A. MAFIA: efficient and scalable subspace clustering for very large data sets. In proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining 1999 (pp. 443-52). ACM.
[11]Chu YH, Huang JW, Chuang KT, Yang DN, Chen MS. Density conscious subspace clustering for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering. 2010; 22(1):16-30.
[12]Kailing K, Kriegel HP, Kröger P. Density-connected subspace clustering for high-dimensional data. In proceedings of 4th international conference on data mining SDM 2004 (pp. 246-56).
[13]Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996 (pp. 226-31).
[14]Assent I, Krieger R, Müller E, Seidl T. INSCY: Indexing subspace clusters with in-process-removal of redundancy. In eighth IEEE international conference on data mining 2008 (pp. 719-24). IEEE.
[15]Müller E, Assent I, Günnemann S, Seidl T. Scalable density-based subspace clustering. In proceedings of the 20th ACM international conference on information and knowledge management 2011 (pp. 1077-86). ACM.
[16]Kriegel HP, Kroger P, Renz M, Wurst S. A generic framework for efficient subspace clustering of high-dimensional data. In fifth IEEE international conference on data mining (ICDM05) 2005 (pp. 1-8). IEEE.
[17]Assent I, Krieger R, Müller E, Seidl T. DUSC: dimensionality unbiased subspace clustering. In seventh IEEE international conference on data mining (ICDM 2007) 2007 (pp. 409-14). IEEE.
[18]Achtert E, Böhm C, Kriegel HP, Kröger P, Müller-Gorman I, Zimek A. Detection and visualization of subspace cluster hierarchies. In international conference on database systems for advanced applications 2007 (pp. 152-63). Springer Berlin Heidelberg.
[19]Zhao L, Zaki MJ. Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In proceedings of the international conference on management of data 2005 (pp. 694-705). ACM.
[20]Sim K, Liu G, Gopalkrishnan V, Li J. A case study on financial ratios via cross-graph quasi-bicliques. Information Sciences. 2011;181(1):201-16.
[21]Sim K, Yap GE, Hardoon DR, Gopalkrishnan V, Cong G, Lukman S. Centroid-based actionable 3D subspace clustering. IEEE Transactions on Knowledge and Data Engineering. 2013; 25(6):1213-26.