ACCENTS Journals

Download PDF
Back

Paper Title	:	A novel phishing detection model using PolyScore feature evaluator and enhanced soft voting ensemble
Author Name	:	Swetha C.V. , Sibi Shaji and B. Meenakshi Sundaram
Abstract	:	Phishing remains a significant cybersecurity threat, particularly on social media platforms widely accessible online. With the increasing prevalence of social media usage and phishing incidents, there is an urgent need to enhance detection accuracy. This research addresses the challenge by leveraging advanced feature selection and classification techniques to propose a novel phishing detection model. The proposed model integrates a PolyScore feature evaluator and an enhanced soft voting ensemble method with geometric mean weighting. The methodology comprises three phases: (1) Data preprocessing, ensuring data cleanliness and readiness for classification; (2) Feature selection using the PolyScore evaluator, which employs six filter techniques with normalized rank percentiles to identify relevant features; and (3) Classification, combining predictions from seven classifiers using a geometric mean weighting-based soft voting ensemble. Ensemble pruning is applied during testing to remove classifiers with lower weights, optimizing the final results. Extensive experiments were conducted on three publicly available datasets, demonstrating superior performance in phishing detection. The proposed model achieved accuracy rates of 97.36% on dataset 1, 87.18% on dataset 2, and 98.27% on dataset 3. When compared to fifteen advanced classifiers, the model outperformed all others in phishing detection on dataset 1 and ranked as the second-best on dataset 3. This research introduces an innovative approach to phishing detection, combining weighted soft voting and ensemble pruning techniques to improve accuracy and robustness. The results highlight the model’s effectiveness in addressing the growing challenge of phishing on online social networks, offering a more reliable solution compared to existing methods.
Keywords	:	Phishing detection, Social media security, Feature selection, Ensemble learning, Soft voting, Cybersecurity.
Cite this article	:	C.V. S, Shaji S, Sundaram BM.A novel phishing detection model using PolyScore feature evaluator and enhanced soft voting ensemble. International Journal of Advanced Technology and Engineering Exploration. 2024;11(121):1641-1663. DOI:10.19101/IJATEE.2024.111100562
References	:	[1]Venkatesh SC, Shaji S, Sundaram BM. A fake profile detection model using multistage stacked ensemble classification. Proceedings of Engineering and Technology Innovation. 2024; 26:18-32. [Google Scholar] [2]Infante A, Mardikaningsih R. The potential of social media as a means of online business promotion. Journal of Social Science Studies. 2022; 2(2):45-9. [Crossref] [Google Scholar] [3]Alharbi A, Alotaibi A, Alghofaili L, Alsalamah M, Alwasil N, Elkhediri S. Security in social-media: awareness of phishing attacks techniques and countermeasures. In 2nd international conference on computing and information technology 2022 (pp. 10-6). IEEE. [Crossref] [Google Scholar] [4]Sonowal G, Sharma A, Kharb L. Spear-phishing emails verification method based on verifiable secret sharing scheme. Journal of Information Assurance & Security. 2021; 16(3): 117-24. [Google Scholar] [5]https://www.waterstons.com/insights/articles/what-social-media-phishing-and-how-can-it-affect-you-and-your-business. Accessed 30 November 2024. [6]Varshney G, Kumawat R, Varadharajan V, Tupakula U, Gupta C. Anti-phishing: a comprehensive perspective. Expert Systems with Applications. 2024; 238:122199. [Crossref] [Google Scholar] [7]Damaraju A. Mitigating phishing attacks: tools, techniques, and user education. Revista Espanola De Documentacion Cientifica. 2024; 18(02):356-85. [Google Scholar] [8]Qabajeh I, Thabtah F, Chiclana F. A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Computer Science Review. 2018; 29:44-55. [Crossref] [Google Scholar] [9]Cui B, He S, Yao X, Shi P. Malicious URL detection with feature extraction based on machine learning. International Journal of High Performance Computing and Networking. 2018; 12(2):166-78. [Crossref] [Google Scholar] [10]Adewole KS, Akintola AG, Salihu SA, Faruk N, Jimoh RG. Hybrid rule-based model for phishing URLs detection. In emerging technologies in computing: second international conference, iCETiC 2019, London, UK, 2019 (pp. 119-35). Springer International Publishing. [Crossref] [Google Scholar] [11]Bountakas P, Xenakis C. Helphed: hybrid ensemble learning phishing email detection. Journal of Network and Computer Applications. 2023; 210:103545. [Crossref] [Google Scholar] [12]Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, et al. Phishing web site detection using diverse machine learning algorithms. The Electronic Library. 2020; 38(1):65-80. [Crossref] [Google Scholar] [13]Vaitkevicius P, Marcinkevicius V. Comparison of classification algorithms for detection of phishing websites. Informatica. 2020; 31(1):143-60. [Crossref] [Google Scholar] [14]Wejinya G, Bhatia S. Machine learning for malicious URL detection. In ICT systems and sustainability: proceedings of ICT4SD 2020 (pp. 463-72). Springer Singapore. [Crossref] [Google Scholar] [15]Boukhalfa K, Guelmaoui MA, Saidani A, Ramdane Y. A proposal phishing attack detection system on twitter. International Journal of Information Security and Privacy. 2022; 16(1):1-27. [Crossref] [Google Scholar] [16]Mughaid A, Alzu’bi S, Hnaif A, Taamneh S, Alnajjar A, Elsoud EA. An intelligent cyber security phishing detection system using deep learning techniques. Cluster Computing. 2022; 25(6):3819-28. [Crossref] [Google Scholar] [17]Ali MS, Jain AK. Efficient feature selection approach for detection of phishing URL of Covid-19 era. In international conference on cyber security, privacy and networking 2021 (pp. 45-56). Cham: Springer International Publishing. [Crossref] [Google Scholar] [18]Mandadi A, Boppana S, Ravella V, Kavitha R. Phishing website detection using machine learning. In 7th international conference for convergence in technology 2022 (pp. 1-4). IEEE. [Crossref] [Google Scholar] [19]Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences. 2019; 484:153-66. [Crossref] [Google Scholar] [20]Salihovic I, Serdarevic H, Kevric J. The role of feature selection in machine learning for detection of spam and phishing attacks. In advanced technologies, systems, and applications III: proceedings of the international symposium on innovative and interdisciplinary applications of advanced technologies 2019 (pp. 476-83). Springer International Publishing. [Crossref] [Google Scholar] [21]Vishva ES, Aju D. Phisher fighter: website phishing detection system based on URL and term frequency-inverse document frequency values. Journal of Cyber Security and Mobility. 2022:83-104. [Crossref] [Google Scholar] [22]Alam MN, Sarma D, Lima FF, Saha I, Hossain S. Phishing attacks detection using machine learning approach. In third international conference on smart systems and inventive technology 2020 (pp. 1173-9). IEEE. [Crossref] [Google Scholar] [23]Sarasjati W, Rustad S, Santoso HA, Syukur A, Rafrastara FA. Comparative study of classification algorithms for website phishing detection on multiple datasets. In international seminar on application for technology of information and communication (iSemantic) 2022 (pp. 448-52). IEEE. [Crossref] [Google Scholar] [24]Hutchinson S, Zhang Z, Liu Q. Detecting phishing websites with random forest. In machine learning and intelligent communications: third international conference, MLICOM 2018, Hangzhou, China 2018 (pp. 470-9). Springer International Publishing. [Crossref] [Google Scholar] [25]Alnemari S, Alshammari M. Detecting phishing domains using machine learning. Applied Sciences. 2023; 13(8):1-16. [Crossref] [Google Scholar] [26]Kumar J, Santhanavijayan A, Janet B, Rajendran B, Bindhumadhava BS. Phishing website classification and detection using machine learning. In international conference on computer communication and informatics 2020 (pp. 1-6). IEEE. [Crossref] [Google Scholar] [27]Joshi K, Bhatt C, Shah K, Parmar D, Corchado JM, Bruno A, et al. Machine-learning techniques for predicting phishing attacks in blockchain networks: a comparative study. Algorithms. 2023; 16(8):1-12. [Crossref] [Google Scholar] [28]Khan SA, Khan W, Hussain A. Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In international conference on intelligent computing methodologies 2020 (pp. 301-13). Springer International Publishing. [Crossref] [Google Scholar] [29]Mohammed BA, Al-mekhlafi ZG. Accuracy of phishing websites detection algorithms by using three ranking techniques. International Journal of Computer Science and Network Security. 2022; 22(2):272-82. [Google Scholar] [30]Abdul SSR, Balasubaramanian S, Al-kaabi AS, Sharma B, Chowdhury S, Mehbodniya A, et al. Analysis of the performance impact of fine-tuned machine learning model for phishing URL detection. Electronics. 2023; 12(7):1-26. [Crossref] [Google Scholar] [31]Choudhary T, Mhapankar S, Bhddha R, Kharuk A, Patil R. A machine learning approach for phishing attack detection. Journal of Artificial Intelligence and Technology. 2023; 3(3):108-13. [Crossref] [Google Scholar] [32]Mao J, Bian J, Tian W, Zhu S, Wei T, Li A, et al. Phishing page detection via learning classifiers from page layout feature. EURASIP Journal on Wireless Communications and Networking. 2019; 2019:1-4. [Crossref] [Google Scholar] [33]Babagoli M, Aghababa MP, Solouk V. Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing. 2019; 23(12):4315-27. [Crossref] [Google Scholar] [34]Al-sarem M, Saeed F, Al-mekhlafi ZG, Mohammed BA, Al-hadhrami T, Alshammari MT, et al. An optimized stacking ensemble model for phishing websites detection. Electronics. 2021; 10(11):1-18. [Crossref] [Google Scholar] [35]Taha A. Intelligent ensemble learning approach for phishing website detection based on weighted soft voting. Mathematics. 2021; 9(21):1-13. [Crossref] [Google Scholar] [36]Karthikeya A, Sai YB, Hariharan S, Rao AC, Jignash D, Prasad AB. Prevention of cyber attacks using deep learning. In 9th international conference on advanced computing and communication systems 2023 (pp. 1332-6). IEEE. [Crossref] [Google Scholar] [37]Ozcan A, Catal C, Donmez E, Senturk B. A hybrid DNN–LSTM model for detecting phishing URLs. Neural Computing and Applications. 2023; 35: 4957–73. [Crossref] [Google Scholar] [38]Huang Y, Yang Q, Qin J, Wen W. Phishing URL detection via CNN and attention-based hierarchical RNN. In 8th international conference on trust, security and privacy in computing and communications/13th IEEE international conference on big data science and engineering 2019 (pp. 112-9). IEEE. [Crossref] [Google Scholar] [39]Ashour MM, Marzouk ES, Abdelhalim E. Anti-phishing approach for IoT system in fog networks based on machine learning algorithms. Mansoura Engineering Journal. 2024; 49(3):1-22. [Crossref] [Google Scholar] [40]Ferreira M. Malicious URL detection using machine learning algorithms. In proceedings of the digital privacy and security conference 2019 (pp. 114-22). [Crossref] [Google Scholar] [41]Aljabri M, Mohammad RM. Click fraud detection for online advertising using machine learning. Egyptian Informatics Journal. 2023; 24(2):341-50. [Crossref] [Google Scholar] [42]Bama SS, Ahmed MI, Saravanan A. A survey on performance evaluation measures for information retrieval system. International Research Journal of Engineering and Technology. 2015; 2(2):1015-20. [Google Scholar] [43]Mohammad R, McCluskey L. Phishing websites. UCI Machine Learning Repository. 2015. [Crossref] [Google Scholar] [44]https://archive.ics.uci.edu/dataset/379/website+phishing. Accessed 30 November 2024. [45]Tan CL. Phishing dataset for machine learning: feature evaluation. Mendeley Data. 2018. [Crossref] [Google Scholar]