International Journal of Advanced Technology and Engineering Exploration ISSN (Print): 2394-5443    ISSN (Online): 2394-7454 Volume-12 Issue-123 February-2025
  1. 3097
    Citations
  2. 2.6
    CiteScore
Efficient sequential rule mining in uncertain sequence databases

Imane Seddiki1,  Farid Nouioua2 and Abdelbasset Barkat3

Department of Computer Science,Mohamed El Bachir El Ibrahimi University, Bordj Bou Arreridj 34000,Algeria1
LIS UMR-CNRS 7020,Aix-Marseille University, Marseille,France2
Laboratory of Informatics and its Applications, Faculty of Mathematics and Computer Science,University of M’sila, M’sila 28000,Algeria3
Corresponding Author : Imane Seddiki

Recieved : 18-Jun-2024; Revised : 18-Feb-2025; Accepted : 20-Feb-2025

Abstract

As data becomes a crucial resource for powering various real-world applications, the field of data mining encounters numerous challenges, particularly regarding storage and real-time processing. Mining association rules to uncover relationships and patterns in large datasets is a crucial technique. However, the inherent uncertainty and incompleteness of data pose significant difficulties for traditional mining algorithms. To tackle these challenges, a novel method is proposed for mining sequential rules from uncertain sequence databases (SDs). This method involves two primary steps: first, extracting a set of probabilistic rules, and second, filtering these rules based on the sequential information within the data. This approach effectively addresses data uncertainty and incompleteness, enabling the extraction of meaningful sequential rules that are otherwise difficult to identify using conventional methods. This innovative method enhances the capability of mining algorithms to handle uncertain data, offering a robust solution for real-time data processing as well as storage issues in various applications. Experimental results demonstrate the algorithm's efficiency and scalability on both synthetic and real-world datasets. The proposed method achieved superior runtime and memory efficiency as dataset sizes increase

Keywords

Association rule, Probabilistic database, Sequences database, Sequential rule, Uncertain data.

References

[1] Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In proceedings of the 1993 SIGMOD international conference on management of data 1993 (pp. 207-16). ACM.

[2] Jashma SPP, Dinesh AU, Reddy NS. Mining frequent itemsets from transaction databases using hybrid switching framework. Multimedia Tools and Applications. 2023; 82(18):27571-91.

[3] Islam MS, Kar PC, Samiullah M, Ahmed CF, Leung CK. Discovering probabilistically weighted sequential patterns in uncertain databases. Applied Intelligence. 2023; 53(6):6525-53.

[4] Huang G, Gan W, Yu PS. TaSPM: targeted sequential pattern mining. ACM Transactions on Knowledge Discovery from Data. 2024; 18(5):1-8.

[5] Fournier-viger P, Faghihi U, Nkambou R, Nguifo EM. CMRules: mining sequential rules common to several sequences. Knowledge-Based Systems. 2012; 25(1):63-76.

[6] Zhao Q, Bhowmick SS. Association rule mining: a survey. Nanyang Technological University, Singapore. 2003; 135:1-20.

[7] Sun L, Cheng R, Cheung DW, Cheng J. Mining uncertain data with probabilistic guarantees. In proceedings of the 16th SIGKDD international conference on knowledge discovery and data mining 2010 (pp. 273-82). ACM.

[8] Bernecker T, Kriegel HP, Renz M, Verhein F, Züfle A. Probabilistic frequent pattern growth for itemset mining in uncertain databases. In international conference on scientific and statistical database management 2012 (pp. 38-55). Berlin, Heidelberg: Springer Berlin Heidelberg.

[9] Leemans SJ, Van ZSJ, Lu X. Partial-order-based process mining: a survey and outlook. Knowledge and Information Systems. 2023; 65(1):1-29.

[10] Fister JI, Fister I, Fister D, Podgorelec V, Salcedo-sanz S. A comprehensive review of visualization methods for association rule mining: taxonomy, challenges, open problems and future ideas. Expert Systems with Applications. 2023; 233:120901.

[11] Wael M, Kassem G. A systematic literature review toward standardization of business rules discovery in the context of process mining. In international conference on technological advancement in embedded and mobile systems 2024 (pp. 33-42). Springer, Cham.

[12] Wang J, Wang C, Huang J, Gao M, Zhou A. Uncertainty-aware self-training for low-resource neural sequence labeling. In proceedings of the AAAI conference on artificial intelligence 2023 (pp. 13682-90). AAAI.

[13] Chen CM, Zhang Z, Ming-tai WJ, Lakshmanna K. High utility periodic frequent pattern mining in multiple sequences. CMES-Computer Modeling in Engineering & Sciences. 2023; 137(1):733-59.

[14] Gao D, Zhu Y, Soares CG. Uncertainty modelling and dynamic risk assessment for long-sequence AIS trajectory based on multivariate gaussian process. Reliability Engineering & System Safety. 2023; 230:108963.

[15] Yeshchenko A, Mendling J. A survey of approaches for event sequence analysis and visualization. Information Systems. 2024; 120:102283.

[16] Zhang Y, Paquette L. Sequential pattern mining in educational data: the application context, potential, strengths, and limitations. In educational data science: essentials, approaches, and tendencies: proactive education based on empirical big data evidence 2023 (pp. 219-54). Singapore: Springer Nature Singapore.

[17] Tong Y, Chen L, Cheng Y, Yu PS. Mining frequent itemsets over uncertain databases. Proceedings of the VLDB Endowment. 2012; 5(11):1650-61.

[18] Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CK. Mining interesting patterns from uncertain databases. Information Sciences. 2016; 354:60-85.

[19] Bernecker T, Cheng R, Cheung DW, Kriegel HP, Lee SD, Renz M, et al. Model-based probabilistic frequent itemset mining. Knowledge and Information Systems. 2013; 37:181-217.

[20] Huang G, Gan W, Weng J, Yu PS. US-Rule: discovering utility-driven sequential rules. ACM Transactions on Knowledge Discovery from Data. 2023; 17(1):1-22.

[21] Wu Y, Zhao X, Li Y, Guo L, Zhu X, Fournier-viger P, et al. OPR-miner: order-preserving rule mining for time series. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(11):11722-35.

[22] Zhang C, Lyu M, Gan W, Yu PS. Totally-ordered sequential rules for utility maximization. ACM Transactions on Knowledge Discovery from Data. 2024; 18(4):1-23.

[23] Fournier-viger P, Gueniche T, Zida S, Tseng VS. ERMiner: sequential rule mining using equivalence classes. In advances in intelligent data analysis XIII: 13th international symposium, Leuven, Belgium, 2014 (pp. 108-19). Springer International Publishing.

[24] Subrahmanian VS, Pulice C, Brown JF, Bonen-clark J, Subrahmanian VS, Pulice C, et al. Temporal probabilistic rules and policy computation algorithms. A Machine Learning Based Model of Boko Haram. 2021:43-52.

[25] Zhang L, Yang G, Li X. Mining sequential patterns of PM2.5 pollution between 338 cities in China. Journal of Environmental Management. 2020; 262:110341.

[26] Shaheen M, Abdullah U. CARM: context based association rule mining for conventional data. Computers, Materials & Continua. 2021; 68(3):3305-22.

[27] Le T, Vo B, Huynh VN, Nguyen NT, Baik SW. Mining top-k frequent patterns from uncertain databases. Applied Intelligence. 2020; 50:1487-97.

[28] Islam MA, Rafi MR, Azad AA, Ovi JA. Weighted frequent sequential pattern mining. Applied Intelligence. 2022; 52(1):254-81.

[29] Leung CK. Mining uncertain data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011; 1(4):316-29.

[30] Modi G, Bansal S, Patidar MA. A survey on sequential rule mining techniques. International Journal for Technological Research in Engineering. 2018; 6(3):4825-8.

[31] Diaz-garcia JA, Ruiz MD, Martin-bautista MJ. A survey on the use of association rules mining techniques in textual social media. Artificial Intelligence Review. 2023; 56(2):1175-200.

[32] Aguiar G, Krawczyk B, Cano A. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning. 2024; 113(7):4165-243.

[33] Khan S, Shaheen M. From data mining to wisdom mining. Journal of Information Science. 2023; 49(4):952-75.

[34] Zhao X, Zhou X, Li G. Automatic database knob tuning: a survey. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(12):12470-90.

[35] Quvvatov B. SQL databases and big data analytics: navigating the data management landscape. Development of Pedagogical Technologies in Modern Sciences. 2024; 3(1):117-24.

[36] Bu C, Zheng X, Zhao X, Xu T, Bai X, Jia Y, et al. GenBase: a nucleotide sequence database. Genomics, Proteomics & Bioinformatics. 2024; 22(3):1-6.

[37] Petrey D, Zhao H, Trudeau SJ, Murray D, Honig B. PrePPI: a structure informed proteome-wide database of protein–protein interactions. Journal of Molecular Biology. 2023; 435(14):168052.

[38] Zhao Z, Yan D, Ng W. Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Transactions on Knowledge and Data Engineering. 2013; 26(5):1171-84.

[39] Fournier-viger P, Nkambou R, Tseng VS. RuleGrowth: mining sequential rules common to several sequences by pattern-growth. In proceedings of the symposium on applied computing 2011 (pp. 956-61). ACM.

[40] Sen P, Deshpande A, Getoor L. Representing tuple and attribute uncertainty in probabilistic databases. In seventh international conference on data mining workshops 2007 (pp. 507-12). IEEE.

[41] Yingtaweesittikul H, Wu J, Mongia A, Peres R, Ko K, Nagarajan N, et al. CREAMMIST: an integrative probabilistic database for cancer drug response prediction. Nucleic Acids Research. 2023; 51(D1): D1242-8.

[42] Civelli S, Forestieri E, Secondini M. Practical implementation of sequence selection for nonlinear probabilistic shaping. In optical fiber communications conference and exhibition 2023 (pp. 1-3). IEEE.

[43] Marchet C, Limasset A. Scalable sequence database search using partitioned aggregated bloom comb trees. Bioinformatics. 2023; 39(Supplement-1):252-9.

[44] Seddiki I, Nouioua F, Barkat A. Extracting sequential frequent itemsets from probabilistic sequences database. International Journal of Information Technology. 2023; 15(5):2509-15.

[45] Chui CK, Kao B, Hung E. Mining frequent itemsets from uncertain data. In advances in knowledge discovery and data mining: 11th Pacific-Asia conference, PAKDD 2007, Nanjing, China, 2007 (pp. 47-58). Springer Berlin Heidelberg.

[46] https://archive.ics.uci.edu/dataset/73/mushroom. Accessed 21 January 2025.