A novel phishing detection model using PolyScore feature evaluator and enhanced soft voting ensemble
Swetha C.V. , Sibi Shaji and B. Meenakshi Sundaram
Abstract
Phishing remains a significant cybersecurity threat, particularly on social media platforms widely accessible online. With the increasing prevalence of social media usage and phishing incidents, there is an urgent need to enhance detection accuracy. This research addresses the challenge by leveraging advanced feature selection and classification techniques to propose a novel phishing detection model. The proposed model integrates a PolyScore feature evaluator and an enhanced soft voting ensemble method with geometric mean weighting. The methodology comprises three phases: (1) Data preprocessing, ensuring data cleanliness and readiness for classification; (2) Feature selection using the PolyScore evaluator, which employs six filter techniques with normalized rank percentiles to identify relevant features; and (3) Classification, combining predictions from seven classifiers using a geometric mean weighting-based soft voting ensemble. Ensemble pruning is applied during testing to remove classifiers with lower weights, optimizing the final results. Extensive experiments were conducted on three publicly available datasets, demonstrating superior performance in phishing detection. The proposed model achieved accuracy rates of 97.36% on dataset 1, 87.18% on dataset 2, and 98.27% on dataset 3. When compared to fifteen advanced classifiers, the model outperformed all others in phishing detection on dataset 1 and ranked as the second-best on dataset 3. This research introduces an innovative approach to phishing detection, combining weighted soft voting and ensemble pruning techniques to improve accuracy and robustness. The results highlight the model’s effectiveness in addressing the growing challenge of phishing on online social networks, offering a more reliable solution compared to existing methods.
Keyword
Phishing detection, Social media security, Feature selection, Ensemble learning, Soft voting, Cybersecurity.
Cite this article
C.V. S, Shaji S, Sundaram BM.A novel phishing detection model using PolyScore feature evaluator and enhanced soft voting ensemble. International Journal of Advanced Technology and Engineering Exploration. 2024;11(121):1641-1663. DOI:10.19101/IJATEE.2024.111100562
Refference
[1]Venkatesh SC, Shaji S, Sundaram BM. A fake profile detection model using multistage stacked ensemble classification. Proceedings of Engineering and Technology Innovation. 2024; 26:18-32.
[2]Infante A, Mardikaningsih R. The potential of social media as a means of online business promotion. Journal of Social Science Studies. 2022; 2(2):45-9.
[3]Alharbi A, Alotaibi A, Alghofaili L, Alsalamah M, Alwasil N, Elkhediri S. Security in social-media: awareness of phishing attacks techniques and countermeasures. In 2nd international conference on computing and information technology 2022 (pp. 10-6). IEEE.
[4]Sonowal G, Sharma A, Kharb L. Spear-phishing emails verification method based on verifiable secret sharing scheme. Journal of Information Assurance & Security. 2021; 16(3): 117-24.
[5]https://www.waterstons.com/insights/articles/what-social-media-phishing-and-how-can-it-affect-you-and-your-business. Accessed 30 November 2024.
[6]Varshney G, Kumawat R, Varadharajan V, Tupakula U, Gupta C. Anti-phishing: a comprehensive perspective. Expert Systems with Applications. 2024; 238:122199.
[7]Damaraju A. Mitigating phishing attacks: tools, techniques, and user education. Revista Espanola De Documentacion Cientifica. 2024; 18(02):356-85.
[8]Qabajeh I, Thabtah F, Chiclana F. A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Computer Science Review. 2018; 29:44-55.
[9]Cui B, He S, Yao X, Shi P. Malicious URL detection with feature extraction based on machine learning. International Journal of High Performance Computing and Networking. 2018; 12(2):166-78.
[10]Adewole KS, Akintola AG, Salihu SA, Faruk N, Jimoh RG. Hybrid rule-based model for phishing URLs detection. In emerging technologies in computing: second international conference, iCETiC 2019, London, UK, 2019 (pp. 119-35). Springer International Publishing.
[11]Bountakas P, Xenakis C. Helphed: hybrid ensemble learning phishing email detection. Journal of Network and Computer Applications. 2023; 210:103545.
[12]Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, et al. Phishing web site detection using diverse machine learning algorithms. The Electronic Library. 2020; 38(1):65-80.
[13]Vaitkevicius P, Marcinkevicius V. Comparison of classification algorithms for detection of phishing websites. Informatica. 2020; 31(1):143-60.
[14]Wejinya G, Bhatia S. Machine learning for malicious URL detection. In ICT systems and sustainability: proceedings of ICT4SD 2020 (pp. 463-72). Springer Singapore.
[15]Boukhalfa K, Guelmaoui MA, Saidani A, Ramdane Y. A proposal phishing attack detection system on twitter. International Journal of Information Security and Privacy. 2022; 16(1):1-27.
[16]Mughaid A, Alzu’bi S, Hnaif A, Taamneh S, Alnajjar A, Elsoud EA. An intelligent cyber security phishing detection system using deep learning techniques. Cluster Computing. 2022; 25(6):3819-28.
[17]Ali MS, Jain AK. Efficient feature selection approach for detection of phishing URL of Covid-19 era. In international conference on cyber security, privacy and networking 2021 (pp. 45-56). Cham: Springer International Publishing.
[18]Mandadi A, Boppana S, Ravella V, Kavitha R. Phishing website detection using machine learning. In 7th international conference for convergence in technology 2022 (pp. 1-4). IEEE.
[19]Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences. 2019; 484:153-66.
[20]Salihovic I, Serdarevic H, Kevric J. The role of feature selection in machine learning for detection of spam and phishing attacks. In advanced technologies, systems, and applications III: proceedings of the international symposium on innovative and interdisciplinary applications of advanced technologies 2019 (pp. 476-83). Springer International Publishing.
[21]Vishva ES, Aju D. Phisher fighter: website phishing detection system based on URL and term frequency-inverse document frequency values. Journal of Cyber Security and Mobility. 2022:83-104.
[22]Alam MN, Sarma D, Lima FF, Saha I, Hossain S. Phishing attacks detection using machine learning approach. In third international conference on smart systems and inventive technology 2020 (pp. 1173-9). IEEE.
[23]Sarasjati W, Rustad S, Santoso HA, Syukur A, Rafrastara FA. Comparative study of classification algorithms for website phishing detection on multiple datasets. In international seminar on application for technology of information and communication (iSemantic) 2022 (pp. 448-52). IEEE.
[24]Hutchinson S, Zhang Z, Liu Q. Detecting phishing websites with random forest. In machine learning and intelligent communications: third international conference, MLICOM 2018, Hangzhou, China 2018 (pp. 470-9). Springer International Publishing.
[25]Alnemari S, Alshammari M. Detecting phishing domains using machine learning. Applied Sciences. 2023; 13(8):1-16.
[26]Kumar J, Santhanavijayan A, Janet B, Rajendran B, Bindhumadhava BS. Phishing website classification and detection using machine learning. In international conference on computer communication and informatics 2020 (pp. 1-6). IEEE.
[27]Joshi K, Bhatt C, Shah K, Parmar D, Corchado JM, Bruno A, et al. Machine-learning techniques for predicting phishing attacks in blockchain networks: a comparative study. Algorithms. 2023; 16(8):1-12.
[28]Khan SA, Khan W, Hussain A. Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In international conference on intelligent computing methodologies 2020 (pp. 301-13). Springer International Publishing.
[29]Mohammed BA, Al-mekhlafi ZG. Accuracy of phishing websites detection algorithms by using three ranking techniques. International Journal of Computer Science and Network Security. 2022; 22(2):272-82.
[30]Abdul SSR, Balasubaramanian S, Al-kaabi AS, Sharma B, Chowdhury S, Mehbodniya A, et al. Analysis of the performance impact of fine-tuned machine learning model for phishing URL detection. Electronics. 2023; 12(7):1-26.
[31]Choudhary T, Mhapankar S, Bhddha R, Kharuk A, Patil R. A machine learning approach for phishing attack detection. Journal of Artificial Intelligence and Technology. 2023; 3(3):108-13.
[32]Mao J, Bian J, Tian W, Zhu S, Wei T, Li A, et al. Phishing page detection via learning classifiers from page layout feature. EURASIP Journal on Wireless Communications and Networking. 2019; 2019:1-4.
[33]Babagoli M, Aghababa MP, Solouk V. Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing. 2019; 23(12):4315-27.
[34]Al-sarem M, Saeed F, Al-mekhlafi ZG, Mohammed BA, Al-hadhrami T, Alshammari MT, et al. An optimized stacking ensemble model for phishing websites detection. Electronics. 2021; 10(11):1-18.
[35]Taha A. Intelligent ensemble learning approach for phishing website detection based on weighted soft voting. Mathematics. 2021; 9(21):1-13.
[36]Karthikeya A, Sai YB, Hariharan S, Rao AC, Jignash D, Prasad AB. Prevention of cyber attacks using deep learning. In 9th international conference on advanced computing and communication systems 2023 (pp. 1332-6). IEEE.
[37]Ozcan A, Catal C, Donmez E, Senturk B. A hybrid DNN–LSTM model for detecting phishing URLs. Neural Computing and Applications. 2023; 35: 4957–73.
[38]Huang Y, Yang Q, Qin J, Wen W. Phishing URL detection via CNN and attention-based hierarchical RNN. In 8th international conference on trust, security and privacy in computing and communications/13th IEEE international conference on big data science and engineering 2019 (pp. 112-9). IEEE.
[39]Ashour MM, Marzouk ES, Abdelhalim E. Anti-phishing approach for IoT system in fog networks based on machine learning algorithms. Mansoura Engineering Journal. 2024; 49(3):1-22.
[40]Ferreira M. Malicious URL detection using machine learning algorithms. In proceedings of the digital privacy and security conference 2019 (pp. 114-22).
[41]Aljabri M, Mohammad RM. Click fraud detection for online advertising using machine learning. Egyptian Informatics Journal. 2023; 24(2):341-50.
[42]Bama SS, Ahmed MI, Saravanan A. A survey on performance evaluation measures for information retrieval system. International Research Journal of Engineering and Technology. 2015; 2(2):1015-20.
[43]Mohammad R, McCluskey L. Phishing websites. UCI Machine Learning Repository. 2015.
[44]https://archive.ics.uci.edu/dataset/379/website+phishing. Accessed 30 November 2024.
[45]Tan CL. Phishing dataset for machine learning: feature evaluation. Mendeley Data. 2018.