International Journal of Advanced Computer Research (IJACR) ISSN (P): 2249-7277 ISSN (O): 2277-7970 Vol - 7, Issue - 32, September 2017
  1. 1
    Google Scholar
  2. 4
    Impact Factor
Performance evaluation of top-k sequential mining methods on synthetic and real datasets

Asima Jamil, Abdus Salam and Farhat Amin

Abstract

Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.

Keyword

Pattern discovery, Top-k, Data mining, Sequential pattern mining, Association rule mining.

Cite this article

Refference

[1][1]Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011.

[2][2]Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. Morgan Kaufmann; 2016.

[3][3]Mabroukeh NR, Ezeife CI. A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR). 2010; 43(1):1-41.

[4][4]Dong G, Pei J. Sequence data mining. Springer Science & Business Media; 2007.

[5][5]Agrawal R, Srikant R. Mining sequential patterns. In proceedings of the eleventh international conference on data engineering 1995 (pp. 3-14). IEEE.

[6][6]Tzvetkov P, Yan X, Han J. TSP: mining top-k closed sequential patterns. Knowledge and Information Systems. 2005; 7(4):438-57.

[7][7]Fournier-Viger P, Gomariz A, Gueniche T, Mwamikazi E, Thomas R. TKS: efficient mining of top-k sequential patterns. In international conference on advanced data mining and applications 2013 (pp. 109-20). Springer, Berlin, Heidelberg.

[8][8]Mooney CH, Roddick JF. Sequential pattern mining--approaches and algorithms. ACM Computing Surveys (CSUR). 2013; 45(2):1-46.

[9][9]Zaki MJ. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning. 2001; 42(1):31-60.

[10][10]Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, et al. Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the international conference on data engineering 2001 (pp. 215-24).

[11][11]Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining 2002 (pp. 429-35). ACM.

[12][12]Gouda K, Hassaan M. Mining sequential patterns in dense databases. International Journal of Database Management Systems (IJDMS). 2011;3(1):179-94.

[13][13]Salvemini E, Fumarola F, Malerba D, Han J. FAST sequence mining based on sparse id-lists. In ISMIS 2011 (pp. 316-25).

[14][14]Hathi KB, Varsur JA, Desai SP, Manvar SR. A performance analysis of sequential pattern mining algorithms. Journal of Emerging Technologies and Innovative Research. 2015; 2(2):397-401.

[15][15]Song S, Hu H, Jin S. HVSM: a new sequential pattern mining algorithm using bitmap representation. International conference on advanced data mining and applications. 2005 (pp. 455-63).

[16][16]Yang Z. Fast algorithms for sequential pattern mining (Doctoral dissertation).2008.

[17][17]Fournier-Viger P, Tseng VS. Mining top-k sequential rules. In international conference on advanced data mining and applications 2011 (pp. 180-94). Springer, Berlin, Heidelberg.

[18][18]Fournier-Viger P, Gomariz A, Šebek M, Hlosta M. VGEN: fast vertical mining of sequential generator patterns. In international conference on data warehousing and knowledge discovery 2014 (pp. 476-88).

[19][19]Chamatkar AJ, Butey PK. Comparison on different data mining algorithms. International Journal of Computer Sciences and Engineering. 2014; 2(10):54-8.

[20][20]Fournier-Viger P, Lin JC, Kiran RU, Koh YS, Thomas R. A survey of sequential pattern mining. Data Science and Pattern Recognition. 2017;1(1):54-77.

[21][21]Fournier-Viger P. SPMF : An Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/. Accessed 26 March 2017.