ACCENTS Journals

Download PDF
Back

Paper Title	:	Performance evaluation of top-k sequential mining methods on synthetic and real datasets
Author Name	:	Asima Jamil, Abdus Salam and Farhat Amin
Abstract	:	Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.
Keywords	:	Pattern discovery, Top-k, Data mining, Sequential pattern mining, Association rule mining.
Cite this article	:	Asima Jamil, Abdus Salam and Farhat Amin.Performance evaluation of top-k sequential mining methods on synthetic and real datasets. International Journal of Advanced Computer Research. 2017;7(32):176-184. DOI:10.19101/IJACR.2017.732004
References	:	[1]Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011. [Google Scholar] [2]Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. Morgan Kaufmann; 2016. [Google Scholar] [3]Mabroukeh NR, Ezeife CI. A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR). 2010; 43(1):1-41. [Crossref] [Google Scholar] [4]Dong G, Pei J. Sequence data mining. Springer Science & Business Media; 2007. [Google Scholar] [5]Agrawal R, Srikant R. Mining sequential patterns. In proceedings of the eleventh international conference on data engineering 1995 (pp. 3-14). IEEE. [Crossref] [Google Scholar] [6]Tzvetkov P, Yan X, Han J. TSP: mining top-k closed sequential patterns. Knowledge and Information Systems. 2005; 7(4):438-57. [Crossref] [Google Scholar] [7]Fournier-Viger P, Gomariz A, Gueniche T, Mwamikazi E, Thomas R. TKS: efficient mining of top-k sequential patterns. In international conference on advanced data mining and applications 2013 (pp. 109-20). Springer, Berlin, Heidelberg. [Crossref] [Google Scholar] [8]Mooney CH, Roddick JF. Sequential pattern mining--approaches and algorithms. ACM Computing Surveys (CSUR). 2013; 45(2):1-46. [Crossref] [Google Scholar] [9]Zaki MJ. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning. 2001; 42(1):31-60. [Crossref] [Google Scholar] [10]Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, et al. Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the international conference on data engineering 2001 (pp. 215-24). [Google Scholar] [11]Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining 2002 (pp. 429-35). ACM. [Crossref] [Google Scholar] [12]Gouda K, Hassaan M. Mining sequential patterns in dense databases. International Journal of Database Management Systems (IJDMS). 2011;3(1):179-94. [Crossref] [Google Scholar] [13]Salvemini E, Fumarola F, Malerba D, Han J. FAST sequence mining based on sparse id-lists. In ISMIS 2011 (pp. 316-25). [Crossref] [Google Scholar] [14]Hathi KB, Varsur JA, Desai SP, Manvar SR. A performance analysis of sequential pattern mining algorithms. Journal of Emerging Technologies and Innovative Research. 2015; 2(2):397-401. [Google Scholar] [15]Song S, Hu H, Jin S. HVSM: a new sequential pattern mining algorithm using bitmap representation. International conference on advanced data mining and applications. 2005 (pp. 455-63). [Crossref] [Google Scholar] [16]Yang Z. Fast algorithms for sequential pattern mining (Doctoral dissertation).2008. [Google Scholar] [17]Fournier-Viger P, Tseng VS. Mining top-k sequential rules. In international conference on advanced data mining and applications 2011 (pp. 180-94). Springer, Berlin, Heidelberg. [Crossref] [Google Scholar] [18]Fournier-Viger P, Gomariz A, Šebek M, Hlosta M. VGEN: fast vertical mining of sequential generator patterns. In international conference on data warehousing and knowledge discovery 2014 (pp. 476-88). [Crossref] [Google Scholar] [19]Chamatkar AJ, Butey PK. Comparison on different data mining algorithms. International Journal of Computer Sciences and Engineering. 2014; 2(10):54-8. [Google Scholar] [20]Fournier-Viger P, Lin JC, Kiran RU, Koh YS, Thomas R. A survey of sequential pattern mining. Data Science and Pattern Recognition. 2017;1(1):54-77. [Google Scholar] [21]Fournier-Viger P. SPMF : An Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/. Accessed 26 March 2017.