ACCENTS Journals

Download PDF
Back

Paper Title	:	Evaluating the efficacy of decision tree-based machine learning in classifying intrusive behaviour of network users
Author Name	:	Ashalata Panigrahi and Manas Ranjan Patra
Abstract	:	Building network intrusion detection models to detect intrusive behaviour of malicious users has been a major challenge to protect network resources. In this study, decision tree (DT) based machine learning (ML) classification techniques, namely, best first tree (BFT), functional tree (FT), J48, naïve Bayes tree (NBT), random forest (RF), random tree (RT), reduced error pruning tree (REPT), simple classification and regression tree (Simple CART) have been employed to build an anomaly-based network intrusion detection model. Further, in order to remove irrelevant features from the intrusion data three different categories of feature selection techniques, namely, (i) entropy based (gain ratio (GR), information gain (IG) and symmetrical uncertainty (SU)), (ii) statistical based (chi-squared, one-r, and relief-f), and (iii) search based exploratory data analysis (EDA), feature subset harmony search (FSHS), linear forward search (LFS), feature vote harmony search (FVHS)) have been applied. The proposed method was evaluated using the widely recognized NSL-KDD dataset. The efficacy of various combinations of eight classifiers and ten feature selection methods (eighty models) was analysed based on seventeen evaluation metrics such as sensitivity, false positive rate (FPR), Matthew’s correlation coefficient (MCC), Kappa coefficient (KC), geometric mean (GM), and discriminant power (DP). Experimental results showed that LFS+RF model achieved the highest accuracy of 0.9989, sensitivity 0.9982, F-value 0.9988, specificity 0.9994, false negative rate (FNR) 0.0018, MCC 0.9977, GM 0.9988, and DP 7.6156 on the NSL-KDD dataset. The proposed model demonstrated its superiority over the other existing models such as support vector machine (SVM), JRip, bagging, deep learning, and neural network (NN).
Keywords	:	Machine learning, Cross-validation, Discriminant power, Geometric mean, Random forest, Naïve bayes tree.
Cite this article	:	Panigrahi A, Patra MR.Evaluating the efficacy of decision tree-based machine learning in classifying intrusive behaviour of network users. International Journal of Advanced Technology and Engineering Exploration. 2024;11(114):736-758. DOI:10.19101/IJATEE.2023.10101904
References	:	[1]Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: springer; 2009. [Crossref] [Google Scholar] [2]Maimon O, Rokach L. Data mining and knowledge discovery handbook. New York: Springer; 2005. [Google Scholar] [3]Zhang D, Nunamaker JF. Powering e-learning in the new millennium: an overview of e-learning and enabling technology. Information Systems Frontiers. 2003; 5:207-18. [Crossref] [Google Scholar] [4]Levatić J, Kocev D, Ceci M, Džeroski S. Semi-supervised trees for multi-target regression. Information Sciences. 2018; 450:109-27. [Crossref] [Google Scholar] [5]Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. OReilly Media, Inc.; 2022. [Google Scholar] [6]Kim DS, Nguyen HN, Ohn SY, Park JS. Fusions of GA and SVM for anomaly detection in intrusion detection system. In advances in neural networks–ISNN: second international symposium on neural networks, Congqing, China, Proceedings, Part III 2005 (pp. 415-20). Springer Berlin Heidelberg. [Crossref] [Google Scholar] [7]Kreibich C, Crowcroft J. Honeycomb: creating intrusion detection signatures using honeypots. ACM SIGCOMM Computer Communication Review. 2004; 34(1):51-6. [Crossref] [Google Scholar] [8]https://docs.broadcom.com/doc/istr-22-2017-en. Accessed 29 November 2020. [9]Pfleeger CP. Security in computing. Prentice-Hall, Inc.; 1988. [Crossref] [Google Scholar] [10]Fadlullah ZM, Tang F, Mao B, Kato N, Akashi O, Inoue T, et al. State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Communications Surveys & Tutorials. 2017; 19(4):2432-55. [Crossref] [Google Scholar] [11]Taher KA, Jisan BM, Rahman MM. Network intrusion detection using supervised machine learning technique with feature selection. In international conference on robotics, electrical and signal processing techniques 2019 (pp. 643-6). IEEE. [Crossref] [Google Scholar] [12]Louati F, Ktata FB. A deep learning-based multi-agent system for intrusion detection. SN Applied Sciences. 2020; 2(4):1-13. [Crossref] [Google Scholar] [13]Rakshe T, Gonjari V. Anomaly based network intrusion detection using machine learning techniques. International Journal of Engineering Research and Technology. 2017; 6(5):216-20. [Google Scholar] [14]Benisha RB, Ratna SR. Detection of data integrity attacks by constructing an effective intrusion detection system. Journal of Ambient Intelligence and Humanized Computing. 2020; 11(11):5233-44. [Google Scholar] [15]Singh R, Kumar H, Singla RK. An intrusion detection system using network traffic profiling and online sequential extreme learning machine. Expert Systems with Applications. 2015; 42(22):8609-24. [Crossref] [Google Scholar] [16]Patel R, Bakhshi D, Arjariya T. Random particle swarm optimization (RPSO) based intrusion detection system. International Journal of Advanced Technology and Engineering Exploration. 2015; 2(5):60-6. [Google Scholar] [17]Sharma N, Gaur B. An approach for efficient intrusion detection based on R-ACO. International Journal of Advanced Technology and Engineering Exploration. 2016; 3(20):98-104. [Crossref] [Google Scholar] [18]Belgrana FZ, Benamrane N, Hamaida MA, Chaabani AM, Taleb-ahmed A. Network intrusion detection system using neural network and condensed nearest neighbors with selection of NSL-KDD influencing features. In IEEE international conference on internet of things and intelligence system 2021 (pp. 23-9). IEEE. [Crossref] [Google Scholar] [19]Al-safi AH, Hani ZI, Zahra MM. Using a hybrid algorithm and feature selection for network anomaly intrusion detection. Journal of Mechanical Engineering Research and Developments. 2021; 44(4):253-62. [Google Scholar] [20]Gurung S, Ghose MK, Subedi A. Deep learning approach on network intrusion detection system using NSL-KDD dataset. International Journal of Computer Network and Information Security. 2019; 11(3):8-14. [Crossref] [Google Scholar] [21]Sharon A, Mohanraj P, Abraham TE, Sundan B, Thangasamy A. An intelligent intrusion detection system using hybrid deep learning approaches in cloud environment. In international conference on computer, communication, and signal processing 2022 (pp. 281-98). Cham: Springer International Publishing. [Google Scholar] [22]Dinesh K, Kalaivani D. Enhancing performance of intrusion detection system in the NSL-KDD dataset using meta-heuristic and machine learning algorithms-design thinking approach. In international conference on sustainable computing and smart systems 2023 (pp. 1471-9). IEEE. [Crossref] [Google Scholar] [23]Pandey AK, Singh P, Jain D, Sharma AK, Jain A, Gupta A. Generative adversarial network and bayesian optimization in multi-class support vector machine for intrusion detection system. International Journal of Intelligent Engineering and Systems. 2023; 16:110-9. [Google Scholar] [24]Ghani H, Virdee B, Salekzamankhani S. A deep learning approach for network intrusion detection using a small features vector. Journal of Cybersecurity and Privacy. 2023; 3(3):451-63. [Crossref] [Google Scholar] [25]Jiang H, Ji S, He G, Li X. Network traffic anomaly detection model based on feature reduction and bidirectional LSTM neural network optimization. Scientific Programming. 2023; 2023:1-18. [Crossref] [Google Scholar] [26]Liu Q, Tong Z, Wang S, Yang Z. Research on intrusion detection method based on feature selection and integrated learning. In journal of physics: conference series 2022 (pp. 1-6). IOP Publishing. [Crossref] [Google Scholar] [27]Shiravani A, Sadreddini MH, Nahook HN. Network intrusion detection using data dimensions reduction techniques. Journal of Big Data. 2023; 10(1):1-25. [Crossref] [Google Scholar] [28]Shi H. Best-first decision tree learning. Doctoral Dissertation, The University of Waikato. 2017. [Google Scholar] [29]Gama J. Functional trees. Machine learning. 2004; 55:219-50. [Crossref] [Google Scholar] [30]Quinlan JR. C4. 5: programs for machine learning. Elsevier; 2014. [Google Scholar] [31]Salzberg SL. Book Review: C4. 5: programs for machine learning. Machine Learning. 1994; 16(3):235-40. [Google Scholar] [32]Kohavi R. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In KDD 1996 (pp. 202-7). [Google Scholar] [33]Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials. 2015; 18(2):1153-76. [Crossref] [Google Scholar] [34]Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Education India; 2016. [Google Scholar] [35]Witten IH, Frank E. Data mining: practical machine learning tools and techniques with java implementations. ACM Sigmod Record. 2002; 31(1):76-7. [Crossref] [Google Scholar] [36]Denil M, Matheson D, De FN. Narrowing the gap: random forests in theory and in practice. In international conference on machine learning 2014 (pp. 665-73). PMLR. [Google Scholar] [37]Dhakar M, Tiwari A. A new model for intrusion detection based on reduced error pruning technique. International Journal of Computer Network and Information Security. 2013; 5(11):51-7. [Crossref] [Google Scholar] [38]Loh WY. Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011; 1(1):14-23. [Crossref] [Google Scholar] [39]Dunham MH. Data mining: introductory and advanced topics. Pearson Education India; 2006. [Google Scholar] [40]Han J, Pei J, Tong H. Data mining: concepts and techniques. Morgan Kaufmann; 2022. [Google Scholar] [41]Hall MA, Smith LA. Practical feature subset selection for machine learning. Proceedings of the 21st Australasian computer science conference ACSC’98, Perth 1998 (pp. 181-91). Springer [Google Scholar] [42]Liu H, Setiono R. Chi2: feature selection and discretization of numeric attributes. In proceedings of 7th international conference on tools with artificial intelligence 1995 (pp. 388-91). IEEE. [Crossref] [Google Scholar] [43]Holte RC. Very simple classification rules perform well on most commonly used datasets. Machine Learning. 1993; 11:63-90. [Crossref] [Google Scholar] [44]Robnik-sikonja M, Kononenko I. Theoretical and empirical analysis of reliefF and RreliefF. Machine Learning. 2003; 53:23-69. [Crossref] [Google Scholar] [45]Liu H, Setiono R. A probabilistic approach to feature selection-a filter solution. In ICML 1996 (pp. 319-27). [Google Scholar] [46]Gutlein M, Frank E, Hall M, Karwath A. Large-scale attribute selection using wrappers. In symposium on computational intelligence and data mining 2009 (pp. 332-9). IEEE. [Crossref] [Google Scholar] [47]Tharwat A. Classification assessment methods. Applied Computing and Informatics. 2020; 17(1):168-92. [Google Scholar] [48]Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies. 2011; 2(1): 37-63. [Crossref] [Google Scholar] [49]Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence 2006 (pp. 1015-21). Springer Berlin Heidelberg. [Crossref] [Google Scholar] [50]https://thedatascientist.com/performance-measures-cohens-kappa-statistic/.Accessed 29 November 2020. [51]Kubat M, Matwin S. Addressing the curse of imbalanced data sets: one-sided sampling. In proceedings of the fourteenth international conference on machine learning 1997 (pp. 179-86). [Google Scholar] [52]James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013. [Google Scholar] [53]Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the KDD CUP 99 data set. In the symposium on computational intelligence for security and defense applications 2009 (pp. 1-6). IEEE. [Google Scholar] [54]Rani MS, Xavier SB. A hybrid intrusion detection system based on C5. 0 decision tree and one-class SVM. International journal of Current Engineering and Technology. 2015; 5(3):2001-7. [Google Scholar] [55]Panigrahi A, Patra MR. Performance evaluation of rule learning classifiers in anomaly based intrusion detection. In proceedings of the international conference on computational intelligence in data mining 2016 (pp. 97-108). Springer India. [Crossref] [Google Scholar] [56]Acharya N, Singh S. An IWD-based feature selection method for intrusion detection system. Soft Computing. 2018; 22:4407-16. [Crossref] [Google Scholar] [57]Gao Y, Liu Y, Jin Y, Chen J, Wu H. A novel semi-supervised learning approach for network intrusion detection on cloud-based robotic system. IEEE Access. 2018; 6:50927-38. [Crossref] [Google Scholar] [58]Bamakan SM, Wang H, Yingjie T, Shi Y. An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization. Neurocomputing. 2016; 199:90-102. [Crossref] [Google Scholar] [59]Wang H, Gu J, Wang S. An effective intrusion detection framework based on SVM with feature augmentation. Knowledge-Based Systems. 2017; 136:130-9. [Crossref] [Google Scholar] [60]Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HF. Improving performance of intrusion detection system using ensemble methods and feature selection. In proceedings of the Australasian computer science week multiconference 2018 (pp. 1-6). ACM. [Crossref] [Google Scholar] [61]Dua M. Attribute selection and ensemble classifier based novel approach to intrusion detection system. Procedia Computer Science. 2020; 167:2191-9. [Crossref] [Google Scholar] [62]Ethala S, Kumarappan A. A hybrid spider monkey and hierarchical particle swarm optimization approach for intrusion detection on internet of things. Sensors. 2022; 22(21):1-18. [Crossref] [Google Scholar] [63]Zakariah M, Alqahtani SA, Alawwad AM, Alotaibi AA. Intrusion detection system with customized machine learning techniques for NSL-KDD dataset. Computers, Materials & Continua. 2023; 77(3):4025-54. [Crossref] [Google Scholar] [64]Ravi V, Chaganti R, Alazab M. Recurrent deep learning-based feature fusion ensemble meta-classifier approach for intelligent network intrusion detection system. Computers and Electrical Engineering. 2022; 102:108156. [Crossref] [Google Scholar] [65]Türk F. Analysis of intrusion detection systems in UNSW-NB15 and NSL-KDD datasets with machine learning algorithms. Bitlis Eren University Journal of Science. 2023; 12(2):465-77. [Crossref] [Google Scholar] [66]Dada EG, Bassi JS, Adekunle OO. An investigation into the effectiveness of machine learning techniques for intrusion detection. Arid Zone Journal of Engineering, Technology and Environment. 2017; 13(6):764-8. [Google Scholar]