Performance evaluation of classifiers for the COVID-19 symptom-based dataset using different feature selection methods
Fauzan Iliya Khalid, Mokhairi Makhtar, Rosaida Rosly and Aceng Sambas
Abstract
Classification algorithms are commonly employed in healthcare systems to aid decision support processes, such as treatment regimens, diagnosis, and illness prediction. The recent emergence of dominant variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), widely known as the coronavirus disease (COVID-19), has emphasized the significance of early detection for ensuring appropriate treatment and protecting unaffected populations. This study assesses the performance of various classification models on a COVID-19 dataset, utilizing two distinct feature selection methods: the wrapper method (WrapperSubsetEval) and the correlation-based feature subset evaluation (CfsSubsetEval). The effectiveness of these methods is evaluated based on the number of features selected for the reduced subset, execution time, and classifier accuracy. The experimentation is conducted using WEKA tools, and five different classifiers are selected for computation and comparison of accuracy: J48 decision tree (DT), support vector machine (SVM), naïve Bayes (NB), sequential minimal optimization (SMO), and k-nearest neighbor (KNN). The performance of each model is assessed using a 10-fold cross-validation technique, and the accuracy of the models is measured. The evaluation results, including comparisons before and after the implementation of the classification process and feature selection methods, indicate that KNN employing WrapperSubsetEval+KNN outperforms other algorithms, achieving the highest accuracy of 98.81%. In summary, the utilization of feature selection methods can be considered an effective approach for COVID-19 prediction.
Keyword
Classification, Machine learning, Feature selection, COVID-19.
Cite this article
Khalid FI, Makhtar M, Rosly R, Sambas A.Performance evaluation of classifiers for the COVID-19 symptom-based dataset using different feature selection methods. International Journal of Advanced Technology and Engineering Exploration. 2023;10(103):741-761. DOI:10.19101/IJATEE.2023.10101228
Refference
[1]Podder P, Mondal MR. Machine learning to predict COVID-19 and ICU requirement. In 11th international conference on electrical and computer engineering 2020 (pp. 483-6). IEEE.
[2]Silahudin D, Holidin A. Model expert system for diagnosis of covid-19 using naïve Bayes classifier. In IOP conference series: materials science and engineering 2020 (pp. 1-7). IOP Publishing.
[3]Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chakraborty C, Mohammed IA. Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Computer Science. 2021; 2:1-3.
[4]Shanmugam SK. A study on the performance of classification models for COVID-19 datasets. Turkish Journal of Computer and Mathematics Education. 2021; 12(10):1123-7.
[5]Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: a review. Egyptian Informatics Journal. 2018; 19(3):179-89.
[6]Rasheed J, Hameed AA, Djeddi C, Jamil A, Al-turjman F. A machine learning-based framework for diagnosis of COVID-19 from chest X-ray images. Interdisciplinary Sciences: Computational Life Sciences. 2021; 13:103-17.
[7]Rahman MM, Usman OL, Muniyandi RC, Sahran S, Mohamed S, Razak RA. A review of machine learning methods of feature selection and classification for autism spectrum disorder. Brain Sciences. 2020; 10(12):949.
[8]Al JKB, Kadhim R. Data reduction techniques: a comparative study for attribute selection methods. International Journal of Advanced Computer Science and Technology. 2018; 8(1):1-13.
[9]Venkatesh B, Anuradha J. A review of feature selection and its methods. Cybernetics and Information Technologies. 2019; 19(1):3-26.
[10]Richhariya B, Tanveer M, Rashid AH. Alzheimer’s disease neuroimaging initiative diagnosis of Alzheimer’s disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomedical Signal Processing and Control. 2020; 59:101903.
[11]Senan EM, Al-adhaileh MH, Alsaade FW, Aldhyani TH, Alqarni AA, Alsharif N, et al. Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. Journal of Healthcare Engineering. 2021; 2021:1-10.
[12]Gnanambal S, Thangaraj M, Meenatchi VT, Gayathri V. Classification algorithms with attribute selection: an evaluation study using WEKA. International Journal of Advanced Networking and Applications. 2018; 9(6):3640-4.
[13]Elgamal ZM, Yasin NB, Tubishat M, Alswaitti M, Mirjalili S. An improved Harris hawks optimization algorithm with simulated annealing for feature selection in the medical field. IEEE Access. 2020; 8:186638-52.
[14]Gárate-escamila AK, El HAH, Andrès E. Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked. 2020; 19:1-13.
[15]Zaini NA, Awang MK. Hybrid feature selection algorithm and ensemble stacking for heart disease prediction. International Journal of Advanced Computer Science and Applications. 2023; 14(2):158-65.
[16]Wah YB, Ibrahim N, Hamid HA, Abdul-rahman S, Fong S. Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika Journal of Science & Technology. 2018; 26(1):329-40.
[17]Alaika L, Alamsyah A. Optimization of accuracy to autism spectrum disorder identification for children using support vector machine and correlation-based feature selection. Journal of Advances in Information Systems and Technology. 2022; 4(1):1-2.
[18]Reddy KV, Elamvazuthi I, Abd AA, Paramasivam S, Chua HN, Pranavanand S. Prediction of heart disease risk using machine learning with correlation-based feature selection and optimization techniques. In 7th international conference on signal processing and communication 2021 (pp. 228-33). IEEE.
[19]Kar M, Dewangan L. Classification of epileptic EEG signals based on J48 classifier and correlation based feature selection. International Journal for Research in Applied Science & Engineering Technology. 2018; 6:2557–60.
[20]Khaniabadi PM, Bouchareb Y, Al-dhuhli H, Shiri I, Al-kindi F, Khaniabadi BM, et al. Two-step machine learning to diagnose and predict involvement of lungs in COVID-19 and pneumonia using CT radiomics. Computers in Biology and Medicine. 2022; 150:106165.
[21]Effrosynidis D, Arampatzis A. An evaluation of feature selection methods for environmental data. Ecological Informatics. 2021; 61:101224.
[22]Zhang R, Nie F, Li X, Wei X. Feature selection with multi-view data: a survey. Information Fusion. 2019; 50:158-67.
[23]Omuya EO, Okeyo GO, Kimwele MW. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications. 2021; 174:114765.
[24]Almugren N, Alshamlan H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access. 2019; 7:78533-48.
[25]Shaban WM, Rabie AH, Saleh AI, Abo-elsoud MA. A new COVID-19 patients detection strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier. Knowledge-Based Systems. 2020; 205:106270.
[26]Torse DA, Khanai R, Pai K, Iyer S, Mavinkattimath S, Kallimani R, et al. Optimal feature selection for COVID-19 detection with CT images enabled by metaheuristic optimization and artificial intelligence. Multimedia Tools and Applications. 2023:1-31.
[27]Danacı Ç, Tuncer SA. Incorporating feature selection methods into machine learning-based covid-19 diagnosis. Applied Computer Systems. 2022; 27(1):13-8.
[28]Hayet-otero M, García-garcía F, Lee DJ, Martínez-minaya J, España VPP, Urrutia LI, et al. Extracting relevant predictive variables for COVID-19 severity prognosis: an exhaustive comparison of feature selection techniques. Plos One. 2023; 18(4):e0284150.
[29]Ali RH, Abdulsalam WH. The prediction of covid 19 disease using feature selection techniques. In journal of physics: conference series 2021 (1-12). IOP Publishing.
[30]Yusuf R. Comparing different supervised machine learning accuracy on analyzing COVID-19 data using ANOVA test. In 6th international conference on interactive digital media 2020 (pp. 1-6). IEEE.
[31]Varzaneh ZA, Orooji A, Erfannia L, Shanbehzadeh M. A new COVID-19 intubation prediction strategy using an intelligent feature selection and K-NN method. Informatics in Medicine Unlocked. 2022; 28:100825.
[32]Mohammad MA, Aljabri M, Aboulnour M, Mirza S, Alshobaiki A. Classifying the mortality of people with underlying health conditions affected by COVID-19 using machine learning techniques. Applied Computational Intelligence and Soft Computing. 2022; 2022:1-12.
[33]Sardar R, Sharma A, Gupta D. Machine learning assisted prediction of prognostic biomarkers associated with COVID-19, using clinical and proteomics data. Frontiers in Genetics. 2021; 12:636441.
[34]Palattao CA, Solano GA, Tee CA, Tee ML. Determining factors contributing to the psychological impact of the COVID-19 pandemic using machine learning. In international conference on artificial intelligence in information and communication 2021 (pp. 219-24). IEEE.
[35]Mahdi AY, Yuhaniz SS. Optimal feature selection using novel flamingo search algorithm for classification of COVID-19 patients from clinical text. Mathematical Biosciences and Engineering. 2023; 20(3):5268-97.
[36]Ranganathan G. A study to find facts behind preprocessing on deep learning algorithms. Journal of Innovative Image Processing. 2021; 3(1):66-74.
[37]Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences. 2017; 12(16):4102-7.
[38]Jain N, Jhunthra S, Garg H, Gupta V, Mohan S, Ahmadian A, et al. Prediction modelling of COVID using machine learning methods from B-cell dataset. Results in Physics. 2021; 21:103813.
[39]Usman MM, Owolabi O, Ajibola AA. Feature selection: it importance in performance prediction. IJESC. 2020:25625-32.
[40]Shaikh TA, Ali R. Applying machine learning algorithms for early diagnosis and prediction of breast cancer risk. In proceedings of 2nd international conference on communication, computing and networking 2019 (pp. 589-98). Springer Singapore.
[41]Cornforth D, Jelinek H, Teich M, Lowen S. Wrapper subset evaluation facilitates the automated detection of diabetes from heart rate variability measures. In international conference on computational intelligence for modelling, control and automation 2004 (pp. 446-55). University of Canberra.
[42]Gonçalves VP, Ribeiro EA, Imai NN. Mapping areas invaded by pinus sp. from geographic object-based image analysis (GEOBIA) applied on RPAS (Drone) color images. Remote Sensing. 2022; 14(12):2805.
[43]Mishra S, Mallick PK, Tripathy HK, Bhoi AK, González-briones A. Performance evaluation of a proposed machine learning model for chronic disease datasets using an integrated attribute evaluator and an improved decision tree classifier. Applied Sciences. 2020; 10(22):8137.
[44]Nedeva V, Pehlivanova T. Students’ performance analyses using machine learning algorithms in WEKA. In IOP conference series: materials science and engineering 2021 (pp. 1-13). IOP Publishing.
[45]Biswas S, Bordoloi M, Purkayastha B. Review on feature selection and classification using neuro-fuzzy approaches. International Journal of Applied Evolutionary Computation. 2016; 7(4):28-44.
[46]Marcot BG, Hanea AM. What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Computational Statistics. 2021; 36(3):2009-31.
[47]Aljohani A. Machine learning techniques for COVID-19 detection: a comparative analysis. International Journal of Computer and Information Engineering. 2022; 16(12):592-7.