Evaluating the efficacy of decision tree-based machine learning in classifying intrusive behaviour of network users
Ashalata Panigrahi and Manas Ranjan Patra
Abstract
Building network intrusion detection models to detect intrusive behaviour of malicious users has been a major challenge to protect network resources. In this study, decision tree (DT) based machine learning (ML) classification techniques, namely, best first tree (BFT), functional tree (FT), J48, naïve Bayes tree (NBT), random forest (RF), random tree (RT), reduced error pruning tree (REPT), simple classification and regression tree (Simple CART) have been employed to build an anomaly-based network intrusion detection model. Further, in order to remove irrelevant features from the intrusion data three different categories of feature selection techniques, namely, (i) entropy based (gain ratio (GR), information gain (IG) and symmetrical uncertainty (SU)), (ii) statistical based (chi-squared, one-r, and relief-f), and (iii) search based exploratory data analysis (EDA), feature subset harmony search (FSHS), linear forward search (LFS), feature vote harmony search (FVHS)) have been applied. The proposed method was evaluated using the widely recognized NSL-KDD dataset. The efficacy of various combinations of eight classifiers and ten feature selection methods (eighty models) was analysed based on seventeen evaluation metrics such as sensitivity, false positive rate (FPR), Matthew’s correlation coefficient (MCC), Kappa coefficient (KC), geometric mean (GM), and discriminant power (DP). Experimental results showed that LFS+RF model achieved the highest accuracy of 0.9989, sensitivity 0.9982, F-value 0.9988, specificity 0.9994, false negative rate (FNR) 0.0018, MCC 0.9977, GM 0.9988, and DP 7.6156 on the NSL-KDD dataset. The proposed model demonstrated its superiority over the other existing models such as support vector machine (SVM), JRip, bagging, deep learning, and neural network (NN).
Keyword
Machine learning, Cross-validation, Discriminant power, Geometric mean, Random forest, Naïve bayes tree.
Cite this article
Panigrahi A, Patra MR.Evaluating the efficacy of decision tree-based machine learning in classifying intrusive behaviour of network users. International Journal of Advanced Technology and Engineering Exploration. 2024;11(114):736-758. DOI:10.19101/IJATEE.2023.10101904
Refference
[1]Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: springer; 2009.
[2]Maimon O, Rokach L. Data mining and knowledge discovery handbook. New York: Springer; 2005.
[3]Zhang D, Nunamaker JF. Powering e-learning in the new millennium: an overview of e-learning and enabling technology. Information Systems Frontiers. 2003; 5:207-18.
[4]Levatić J, Kocev D, Ceci M, Džeroski S. Semi-supervised trees for multi-target regression. Information Sciences. 2018; 450:109-27.
[5]Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. OReilly Media, Inc.; 2022.
[6]Kim DS, Nguyen HN, Ohn SY, Park JS. Fusions of GA and SVM for anomaly detection in intrusion detection system. In advances in neural networks–ISNN: second international symposium on neural networks, Congqing, China, Proceedings, Part III 2005 (pp. 415-20). Springer Berlin Heidelberg.
[7]Kreibich C, Crowcroft J. Honeycomb: creating intrusion detection signatures using honeypots. ACM SIGCOMM Computer Communication Review. 2004; 34(1):51-6.
[8]https://docs.broadcom.com/doc/istr-22-2017-en. Accessed 29 November 2020.
[9]Pfleeger CP. Security in computing. Prentice-Hall, Inc.; 1988.
[10]Fadlullah ZM, Tang F, Mao B, Kato N, Akashi O, Inoue T, et al. State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Communications Surveys & Tutorials. 2017; 19(4):2432-55.
[11]Taher KA, Jisan BM, Rahman MM. Network intrusion detection using supervised machine learning technique with feature selection. In international conference on robotics, electrical and signal processing techniques 2019 (pp. 643-6). IEEE.
[12]Louati F, Ktata FB. A deep learning-based multi-agent system for intrusion detection. SN Applied Sciences. 2020; 2(4):1-13.
[13]Rakshe T, Gonjari V. Anomaly based network intrusion detection using machine learning techniques. International Journal of Engineering Research and Technology. 2017; 6(5):216-20.
[14]Benisha RB, Ratna SR. Detection of data integrity attacks by constructing an effective intrusion detection system. Journal of Ambient Intelligence and Humanized Computing. 2020; 11(11):5233-44.
[15]Singh R, Kumar H, Singla RK. An intrusion detection system using network traffic profiling and online sequential extreme learning machine. Expert Systems with Applications. 2015; 42(22):8609-24.
[16]Patel R, Bakhshi D, Arjariya T. Random particle swarm optimization (RPSO) based intrusion detection system. International Journal of Advanced Technology and Engineering Exploration. 2015; 2(5):60-6.
[17]Sharma N, Gaur B. An approach for efficient intrusion detection based on R-ACO. International Journal of Advanced Technology and Engineering Exploration. 2016; 3(20):98-104.
[18]Belgrana FZ, Benamrane N, Hamaida MA, Chaabani AM, Taleb-ahmed A. Network intrusion detection system using neural network and condensed nearest neighbors with selection of NSL-KDD influencing features. In IEEE international conference on internet of things and intelligence system 2021 (pp. 23-9). IEEE.
[19]Al-safi AH, Hani ZI, Zahra MM. Using a hybrid algorithm and feature selection for network anomaly intrusion detection. Journal of Mechanical Engineering Research and Developments. 2021; 44(4):253-62.
[20]Gurung S, Ghose MK, Subedi A. Deep learning approach on network intrusion detection system using NSL-KDD dataset. International Journal of Computer Network and Information Security. 2019; 11(3):8-14.
[21]Sharon A, Mohanraj P, Abraham TE, Sundan B, Thangasamy A. An intelligent intrusion detection system using hybrid deep learning approaches in cloud environment. In international conference on computer, communication, and signal processing 2022 (pp. 281-98). Cham: Springer International Publishing.
[22]Dinesh K, Kalaivani D. Enhancing performance of intrusion detection system in the NSL-KDD dataset using meta-heuristic and machine learning algorithms-design thinking approach. In international conference on sustainable computing and smart systems 2023 (pp. 1471-9). IEEE.
[23]Pandey AK, Singh P, Jain D, Sharma AK, Jain A, Gupta A. Generative adversarial network and bayesian optimization in multi-class support vector machine for intrusion detection system. International Journal of Intelligent Engineering and Systems. 2023; 16:110-9.
[24]Ghani H, Virdee B, Salekzamankhani S. A deep learning approach for network intrusion detection using a small features vector. Journal of Cybersecurity and Privacy. 2023; 3(3):451-63.
[25]Jiang H, Ji S, He G, Li X. Network traffic anomaly detection model based on feature reduction and bidirectional LSTM neural network optimization. Scientific Programming. 2023; 2023:1-18.
[26]Liu Q, Tong Z, Wang S, Yang Z. Research on intrusion detection method based on feature selection and integrated learning. In journal of physics: conference series 2022 (pp. 1-6). IOP Publishing.
[27]Shiravani A, Sadreddini MH, Nahook HN. Network intrusion detection using data dimensions reduction techniques. Journal of Big Data. 2023; 10(1):1-25.
[28]Shi H. Best-first decision tree learning. Doctoral Dissertation, The University of Waikato. 2017.
[29]Gama J. Functional trees. Machine learning. 2004; 55:219-50.
[30]Quinlan JR. C4. 5: programs for machine learning. Elsevier; 2014.
[31]Salzberg SL. Book Review: C4. 5: programs for machine learning. Machine Learning. 1994; 16(3):235-40.
[32]Kohavi R. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In KDD 1996 (pp. 202-7).
[33]Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials. 2015; 18(2):1153-76.
[34]Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Education India; 2016.
[35]Witten IH, Frank E. Data mining: practical machine learning tools and techniques with java implementations. ACM Sigmod Record. 2002; 31(1):76-7.
[36]Denil M, Matheson D, De FN. Narrowing the gap: random forests in theory and in practice. In international conference on machine learning 2014 (pp. 665-73). PMLR.
[37]Dhakar M, Tiwari A. A new model for intrusion detection based on reduced error pruning technique. International Journal of Computer Network and Information Security. 2013; 5(11):51-7.
[38]Loh WY. Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011; 1(1):14-23.
[39]Dunham MH. Data mining: introductory and advanced topics. Pearson Education India; 2006.
[40]Han J, Pei J, Tong H. Data mining: concepts and techniques. Morgan Kaufmann; 2022.
[41]Hall MA, Smith LA. Practical feature subset selection for machine learning. Proceedings of the 21st Australasian computer science conference ACSC’98, Perth 1998 (pp. 181-91). Springer
[42]Liu H, Setiono R. Chi2: feature selection and discretization of numeric attributes. In proceedings of 7th international conference on tools with artificial intelligence 1995 (pp. 388-91). IEEE.
[43]Holte RC. Very simple classification rules perform well on most commonly used datasets. Machine Learning. 1993; 11:63-90.
[44]Robnik-sikonja M, Kononenko I. Theoretical and empirical analysis of reliefF and RreliefF. Machine Learning. 2003; 53:23-69.
[45]Liu H, Setiono R. A probabilistic approach to feature selection-a filter solution. In ICML 1996 (pp. 319-27).
[46]Gutlein M, Frank E, Hall M, Karwath A. Large-scale attribute selection using wrappers. In symposium on computational intelligence and data mining 2009 (pp. 332-9). IEEE.
[47]Tharwat A. Classification assessment methods. Applied Computing and Informatics. 2020; 17(1):168-92.
[48]Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies. 2011; 2(1): 37-63.
[49]Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence 2006 (pp. 1015-21). Springer Berlin Heidelberg.
[50]https://thedatascientist.com/performance-measures-cohens-kappa-statistic/.Accessed 29 November 2020.
[51]Kubat M, Matwin S. Addressing the curse of imbalanced data sets: one-sided sampling. In proceedings of the fourteenth international conference on machine learning 1997 (pp. 179-86).
[52]James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013.
[53]Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the KDD CUP 99 data set. In the symposium on computational intelligence for security and defense applications 2009 (pp. 1-6). IEEE.
[54]Rani MS, Xavier SB. A hybrid intrusion detection system based on C5. 0 decision tree and one-class SVM. International journal of Current Engineering and Technology. 2015; 5(3):2001-7.
[55]Panigrahi A, Patra MR. Performance evaluation of rule learning classifiers in anomaly based intrusion detection. In proceedings of the international conference on computational intelligence in data mining 2016 (pp. 97-108). Springer India.
[56]Acharya N, Singh S. An IWD-based feature selection method for intrusion detection system. Soft Computing. 2018; 22:4407-16.
[57]Gao Y, Liu Y, Jin Y, Chen J, Wu H. A novel semi-supervised learning approach for network intrusion detection on cloud-based robotic system. IEEE Access. 2018; 6:50927-38.
[58]Bamakan SM, Wang H, Yingjie T, Shi Y. An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization. Neurocomputing. 2016; 199:90-102.
[59]Wang H, Gu J, Wang S. An effective intrusion detection framework based on SVM with feature augmentation. Knowledge-Based Systems. 2017; 136:130-9.
[60]Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HF. Improving performance of intrusion detection system using ensemble methods and feature selection. In proceedings of the Australasian computer science week multiconference 2018 (pp. 1-6). ACM.
[61]Dua M. Attribute selection and ensemble classifier based novel approach to intrusion detection system. Procedia Computer Science. 2020; 167:2191-9.
[62]Ethala S, Kumarappan A. A hybrid spider monkey and hierarchical particle swarm optimization approach for intrusion detection on internet of things. Sensors. 2022; 22(21):1-18.
[63]Zakariah M, Alqahtani SA, Alawwad AM, Alotaibi AA. Intrusion detection system with customized machine learning techniques for NSL-KDD dataset. Computers, Materials & Continua. 2023; 77(3):4025-54.
[64]Ravi V, Chaganti R, Alazab M. Recurrent deep learning-based feature fusion ensemble meta-classifier approach for intelligent network intrusion detection system. Computers and Electrical Engineering. 2022; 102:108156.
[65]Türk F. Analysis of intrusion detection systems in UNSW-NB15 and NSL-KDD datasets with machine learning algorithms. Bitlis Eren University Journal of Science. 2023; 12(2):465-77.
[66]Dada EG, Bassi JS, Adekunle OO. An investigation into the effectiveness of machine learning techniques for intrusion detection. Arid Zone Journal of Engineering, Technology and Environment. 2017; 13(6):764-8.