Optimizing software fault prediction using decision tree regression and soft computing techniques
Gurmeet Kaur, Jyoti Pruthi and Parul Gandhi
Abstract
This research aims to develop a framework for software fault prediction (SFP) using machine learning techniques. A software fault may be the reason behind the failure of software functioning, and even a minor fault could cause the failure. Efficient SFP improves the overall quality and performance of the software products while streamlining the development process. The framework aims to reduce the cost and time involved in software development while optimizing the reliability of the software. It facilitates quick and efficient testing by identifying the modules that are likely to fail at the early stages of the project. Soft computing techniques provide an easy and effective solution for prediction problems. This study emphasizes the significance of soft computing approaches in SFP and highlights their role in improving computational efficiency, reducing development costs, and enhancing the reliability of software applications. Soft computing-based technique was proposed to address the prediction challenges. A metric suite was suggested, which includes a requirement-based metric and an adoption metric, designed by integrating process metrics of software development phases for fault prediction. It also designs decision tree regression (DTR)-based SFP model that uses these metrics as input and delivers predicted faults as output. The literature review reveals that only a few existing frameworks meet the requirement of implementing SFP models using a broad range of soft computing approaches for the same dataset. The suggested metric suite is validated by computing performance measures such as the area under curve (AUC), F-measure, precision, recall, and accuracy. The high-performance values of the suggested metric suite demonstrate its efficient fault prediction capability. The study also compares the performance of the suggested model with other adaptive neuro fuzzy inference systems (ANFIS), fuzzy-inference systems, and Bayesian-net-based SFP models, measured by root mean square error (RMSE), normalized root mean square error (NRMSE), the mean magnitude of relative error (MMRE), the balanced mean magnitude of relative error (BMMRE), and R-Squared. The suggested model outperforms others, achieving RMSE, MMRE, and R-Squared values of 3.54, 2.04 e-05, and 99.78, respectively. This study presents a highly efficient DTR based SFP model with more fault prediction accuracy than the existing SFP models. Implementation of this model is to significantly reduce costs and improve the time and effort of software development, making it an invaluable tool for software engineers.
Keyword
Software fault prediction, Predicted-fault, Process metrics, Soft-computing, Decision tree regression, Machine learning.
Cite this article
Kaur G, Pruthi J, Gandhi P.Optimizing software fault prediction using decision tree regression and soft computing techniques. International Journal of Advanced Technology and Engineering Exploration. 2024;11(113):604-623. DOI:10.19101/IJATEE.2023.10101890
Refference
[1]Rathore SS, Kumar S. Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowledge-Based Systems. 2017; 119:232-56.
[2]Sandhu PS, Khullar S, Singh S, Bains SK, Kaur M, Singh G. A study on early prediction of fault proneness in software modules using genetic algorithm. International Journal of Computer and Information Engineering. 2010; 4(12):1891-6.
[3]Kaur R, Sharma ES. Various techniques to detect and predict faults in software system: survey. International Journal on Future Revolution in Computer Science & Communication Engineering. 2018; 4(2):330-6.
[4]Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Transactions on Software Engineering. 1994; 20(6):476-93.
[5]Liu J, Lei J, Liao Z, He J. Software defect prediction model based on improved twin support vector machines. Soft Computing. 2023; 27(21):16101-10.
[6]Azzeh M, Alqasrawi Y, Elsheikh Y. A soft computing approach for software defect density prediction. Journal of Software: Evolution and Process. 2023; 36(4).
[7]Batool I, Khan TA. Software fault prediction using deep learning techniques. Software Quality Journal. 2023; 31(4):1241-80.
[8]Borandag E. Software fault prediction using an RNN-based deep learning approach and ensemble machine learning techniques. Applied Sciences. 2023; 13(3):1-21.
[9]Thirumoorthy K. A feature selection model for software defect prediction using binary Rao optimization algorithm. Applied Soft Computing. 2022; 131:109737.
[10]Goyal S. Software fault prediction using evolving populations with mathematical diversification. Soft Computing. 2022; 26(24):13999-4020.
[11]Daoud MS, Aftab S, Ahmad M, Khan MA, Iqbal A, Abbas S, et al. Machine learning empowered software defect prediction system. Intelligent Automation & Soft Computing. 2022; 31(2): 1287:1300.
[12]Farid AB, Fathy EM, Eldin AS, Abd-elmegid LA. Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM). Peer J Computer Science. 2021; 7:1-22.
[13]Zain ZM, Sakri S, Asmak INH, Parizi RM. Software defect prediction harnessing on multi 1-dimensional convolutional neural network structure. Computers, Materials & Continua. 2022; 71(1):1521-46.
[14]Hassouneh Y, Turabieh H, Thaher T, Tumar I, Chantar H, Too J. Boosted whale optimization algorithm with natural selection operators for software fault prediction. IEEE Access. 2021; 9:14239-58.
[15]Sharma P, Sangal AL. Building and testing a fuzzy linguistic assessment framework for defect prediction in ASD environment using process-based software metrics. Arabian Journal for Science and Engineering. 2020; 45(12):10327-51.
[16]Tumar I, Hassouneh Y, Turabieh H, Thaher T. Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction. IEEE Access. 2020; 8:8041-55.
[17]Juneja K. A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-version and inter-project evaluation. Applied Soft Computing. 2019; 77:696-713.
[18]Turabieh H, Mafarja M, Li X. Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Systems with Applications. 2019; 122:27-42.
[19]Bilgaiyan S, Mishra S, Das M. Effort estimation in agile software development using experimental validation of neural network models. International Journal of Information Technology. 2019; 11(3):569-73.
[20]Chatterjee S, Maji B. A bayesian belief network based model for predicting software faults in early phase of software development process. Applied Intelligence. 2018; 48(8):2214-28.
[21]Kalaivani N, Beena R. Overview of software defect prediction using machine learning algorithms. International Journal of Pure and Applied Mathematics. 2018; 118(20):3863-73.
[22]Arshad A, Riaz S, Jiao L, Murthy A. Semi-supervised deep fuzzy c-mean clustering for software fault prediction. IEEE Access. 2018; 6:25675-85.
[23]Geng W. RETRACTED: cognitive deep neural networks prediction method for software fault tendency module based on bound particle swarm optimization. Cognitive Systems Research. 2018; 5(c):1-12.
[24]Singh P. Comprehensive model for software fault prediction. In international conference on inventive computing and informatics 2017 (pp. 1103-8). IEEE.
[25]Dhanajayan RC, Pillai SA. SLMBC: spiral life cycle model-based bayesian classification technique for efficient software fault prediction and classification. Soft Computing. 2017; 21(2):403-15.
[26]Chatterjee S, Maji B. A new fuzzy rule based algorithm for estimating software faults in early phase of development. Soft Computing. 2016; 20:4023-35.
[27]Yadav HB, Yadav DK. A fuzzy logic based approach for phase-wise software defects prediction using software metrics. Information and Software Technology. 2015; 63:44-57.
[28]He P, Li B, Liu X, Chen J, Ma Y. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology. 2015; 59:170-90.
[29]Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, et al. Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Transactions on Software Engineering. 2013; 39(10):1345-57.
[30]Pandey AK, Goyal NK, Pandey AK, Goyal NK. Multistage model for residual fault prediction. Early Software Reliability Prediction: a Fuzzy Logic Approach. 2013:59-80.
[31]Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering. 2011; 38(6):1276-304.
[32]Jin C, Jin SW, Ye JM. Artificial neural network-based metric selection for software fault-prone prediction model. IET Software. 2012; 6(6):479-87.
[33]Bishnu PS, Bhattacherjee V. Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering. 2011; 24(6):1146-50.
[34]Arisholm E, Briand LC, Johannessen EB. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software. 2010; 83(1):2-17.
[35]Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences. 2009; 179(8):1040-58.
[36]Turhan B, Bener A. Analysis of naive bayes assumptions on software fault data: an empirical study. Data & Knowledge Engineering. 2009; 68(2):278-90.
[37]Fenton N, Neil M, Marsh W, Hearty P, Radliński Ł, Krause P. On the effectiveness of early life cycle defect prediction with bayesian nets. Empirical Software Engineering. 2008; 13:499-537.
[38]Khoshgoftaar TM, Seliya N. Software quality classification modeling using the SPRINT decision tree algorithm. International Journal on Artificial Intelligence Tools. 2003; 12(3):207-25.
[39]Koru AG, Liu H. An investigation of the effect of module size on defect prediction using static measures. In proceedings of the 2005 workshop on predictor models in software engineering 2005 (pp. 1-5). ACM.
[40]Wang Q, Yu B, Zhu J. Extract rules from software quality prediction model based on neural network. In 16th international conference on tools with artificial intelligence 2004 (pp. 191-5). IEEE.
[41]Briand LC, Wüst J, Ikonomovski SV, Lounis H. Investigating quality factors in object-oriented designs: an industrial case study. In proceedings of the 21st international conference on software engineering 1999 (pp. 345-54).
[42]Kaur G, Pruthi J. A study of agile-based approaches to improve software quality. International Journal of Computer and Systems Engineering. 2022; 16(5):158-63.
[43]Kaur G, Pruthi J, Gandhi P. Machine learning based software fault prediction models. Karbala International Journal of Modern Science. 2023; 9(2):9.
[44]Kaur G, Pruthi J, Gandhi P. Decision tree regression analysis of proposed metric suite for software fault prediction. SN Computer Science. 2023; 5(1):69.
[45]Keele S. Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report. 2007.
[46]Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Transactions on Software Engineering. 2008; 34(4):485-96.
[47]Myrtveit I, Stensrud E, Shepperd M. Reliability and validity in comparative studies of software prediction models. IEEE Transactions on Software Engineering. 2005; 31(5):380-91.
[48]Raeder T, Hoens TR, Chawla NV. Consequences of variability in classifier performance estimates. In international conference on data mining 2010 (pp. 421-30). IEEE.
[49]Song Q, Jia Z, Shepperd M, Ying S, Liu J. A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering. 2010; 37(3):356-70.
[50]Ince DC, Hatton L, Graham-cumming J. The case for open computer programs. Nature. 2012; 482(7386):485-8.
[51]http://promise.site.uottawa.ca/SERepository/datasets-page.html. Accessed 26 March 2024.
[52]Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability. 2013; 62(2):434-43.
[53]Xu M, Watanachaturaporn P, Varshney PK, Arora MK. Decision tree regression for soft classification of remote sensing data. Remote Sensing of Environment. 2005; 97(3):322-36.
[54]Baştanlar Y, Özuysal M. Introduction to machine learning. miRNomics: MicroRNA Biology and Computational Analysis. 2014; 105-28.
[55]Sarkar D, Bali R, Sharma T. Practical machine learning with python. Book Practical Machine Learning with Python. 2018; 25-30.
[56]Manias DM, Jammal M, Hawilo H, Shami A, Heidari P, Larabi A, et al. Machine learning for performance-aware virtual network function placement. In global communications conference 2019 (pp. 1-6). IEEE.
[57]Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. The Journal of Machine Learning Research. 2011; 12:2825-30.