Anomaly detection in smart contracts based on optimal relevance hybrid features analysis in the Ethereum blockchain employing ensemble learning
Sabri Hisham, Mokhairi Makhtar and Azwa Abdul Aziz
Abstract
Blockchain 2.0 has revolutionized the domain by introducing blockchain as a decentralized application (DApp) development platform, previously recognized mainly in the cryptocurrency sphere. Consequently, the rise of DApp development has inadvertently camouflaged fraudulent activities within smart contracts, leading to substantial losses for investors. Implementing machine learning (ML) approaches can significantly enhance the efficacy of anomaly detection. However, many studies still grapple with selecting the most pertinent features to optimize anomaly detection levels. This challenge intensifies when managing the high-dimensional raw data extracted directly from the Ethereum blockchain network, which falls under the category of big data. Smart contracts, the core of blockchain that governs DApp logic, have increasingly become a haven for fraud. This study focuses on analyzing three primary characteristic components based on contract source code (operation code (opcode), application binary interface (ABI) code, and contract transaction) to develop anomaly detection models in smart contracts using an ensemble hybrid feature strategy. The approach involves two key stages: firstly, reducing the initial feature size through constant, quasi-constant, and variant validation; and secondly, identifying the most relevant feature set using the searching for uncorrelated list of variables (SULOV) method, grounded in the minimum redundancy maximum relevance (MRMR) principle. The anomaly detection model employs a voting ensemble technique, harnessing a dataset of the most pertinent features. The model's effectiveness is gauged by comparing its performance with individual models, including random forest (RF), k-nearest neighbor (KNN), decision tree (DT), linear discriminant analysis (LDA), and stochastic gradient descent (SGD). The findings indicate that the proposed model achieves superior anomaly detection levels, with a determination value measurement rate of 92.99%, outperforming individual classifiers using the 44 most relevant features while minimizing classification time. The model's efficiency is further corroborated through comparative analysis with previous studies and alternative methodologies using the same contract dataset. The proposed ensemble-based model significantly improves anomaly detection in contract source code analysis, employing a minimal and relevant set of features refined through the SULOV method.
Keyword
Ethereum, Blockchain, Smart contract, Features selection, Relevance features, Ensemble method, Anomaly detection.
Cite this article
Hisham S, Makhtar M, Aziz AA.Anomaly detection in smart contracts based on optimal relevance hybrid features analysis in the Ethereum blockchain employing ensemble learning . International Journal of Advanced Technology and Engineering Exploration. 2023;10(109):1552-1579. DOI:10.19101/IJATEE.2023.10102216
Refference
[1]Hu T, Liu X, Chen T, Zhang X, Huang X, Niu W, et al. Transaction-based classification and detection approach for Ethereum smart contract. Information Processing & Management. 2021; 58(2):102462.
[2]Chen W, Guo X, Chen Z, Zheng Z, Lu Y, Li Y. Honeypot contract risk warning on Ethereum smart contracts. In international conference on joint cloud computing 2020 (pp. 1-8). IEEE.
[3]Bitcoin NS. Bitcoin: a peer-to-peer electronic cash system. 2008.
[4]Wu J, Yuan Q, Lin D, You W, Chen W, Chen C, et al. Who are the phishers? phishing scam detection on Ethereum via network embedding. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2020; 52(2):1156-66.
[5]Zheng Z, Chen W, Zhong Z, Chen Z, Lu Y. Securing the Ethereum from smart ponzi schemes: identification using static features. ACM Transactions on Software Engineering and Methodology. 2023; 32(5):1-28.
[6]Deepa N, Pham QV, Nguyen DC, Bhattacharya S, Prabadevi B, Gadekallu TR, et al. A survey on blockchain for big data: approaches, opportunities, and future directions. Future Generation Computer Systems. 2022; 131:209-26.
[7]Huang J, He D, Obaidat MS, Vijayakumar P, Luo M, Choo KK. The application of the blockchain technology in voting systems: a review. ACM Computing Surveys (CSUR). 2021; 54(3):1-28.
[8]Christidis K, Devetsikiotis M. Blockchains and smart contracts for the internet of things. IEEE Access. 2016; 4:2292-303.
[9]Berdik D, Otoum S, Schmidt N, Porter D, Jararweh Y. A survey on blockchain for information systems management and security. Information Processing & Management. 2021; 58(1):102397.
[10]Belchior R, Vasconcelos A, Guerreiro S, Correia M. A survey on blockchain interoperability: past, present, and future trends. ACM Computing Surveys (CSUR). 2021; 54(8):1-41.
[11]Qin K, Zhou L, Gervais A. Quantifying blockchain extractable value: how dark is the forest? In symposium on security and privacy (SP) 2022 (pp. 198-214). IEEE.
[12]Rahouti M, Xiong K, Ghani N. Bitcoin concepts, threats, and machine-learning security solutions. IEEE Access. 2018; 6:67189-205.
[13]Liu L, Tsai WT, Bhuiyan MZ, Peng H, Liu M. Blockchain-enabled fraud discovery through abnormal smart contract detection on Ethereum. Future Generation Computer Systems. 2022; 128:158-66.
[14]Wood G. Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper. 2014; 151(2014):1-32.
[15]Szabo N. Formalizing and securing relationships on public networks. First Monday. 1997; 2(9).
[16]Buterin V. Ethereum white paper: a next generation smart contract & decentralized application platform. First Version. 2014; 53.
[17]Buterin V. A next-generation smart contract and decentralized application platform. White Paper. 2014; 3(37):1-27.
[18]Cheng Z, Hou X, Li R, Zhou Y, Luo X, Li J, et al. Towards a first step to understand the cryptocurrency stealing attack on Ethereum. In international symposium on research in attacks, intrusions and defenses (RAID 2019) 2019 (pp. 47-60). USENIX Association.
[19]Sallam A, Rassem T, Abdu H, Abdulkareem H, Saif N, Abdullah S. Fraudulent account detection in the Ethereum’s network using various machine learning techniques. International Journal of Software Engineering and Computer Systems. 2022; 8(2):43-50.
[20]Camino R, Torres CF, Baden M, State R. A data science approach for detecting honeypots in Ethereum. In international conference on blockchain and cryptocurrency (ICBC) 2020 (pp. 1-9). IEEE.
[21]Hu B, Zhou C, Tian YC, Qin Y, Junping X. A collaborative intrusion detection approach using blockchain for multimicrogrid systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019; 49(8):1720-30.
[22]Preuveneers D, Rimmer V, Tsingenopoulos I, Spooren J, Joosen W, Ilie-zudor E. Chained anomaly detection models for federated learning: an intrusion detection case study. Applied Sciences. 2018; 8(12):1-21.
[23]Nguyen TD, Pham LH, Sun J, Lin Y, Minh QT. Sfuzz: an efficient adaptive fuzzer for solidity smart contracts. In proceedings of the ACM/IEEE 42nd international conference on software engineering 2020 (pp. 778-88).
[24]Fan S, Fu S, Xu H, Zhu C. Expose your mask: smart ponzi schemes detection on blockchain. In international joint conference on neural networks (IJCNN) 2020 (pp. 1-7). IEEE.
[25]Vasek M, Moore T. Analyzing the bitcoin ponzi scheme ecosystem. In financial cryptography and data security: FC 2018 international workshops, BITCOIN, VOTING, and WTSC, Nieuwpoort, Curaçao 2019 (pp. 101-12). Springer Berlin Heidelberg.
[26]Bartoletti M, Carta S, Cimoli T, Saia R. Dissecting ponzi schemes on Ethereum: identification, analysis, and impact. Future Generation Computer Systems. 2020; 102:259-77.
[27]Zhou Y, Kumar D, Bakshi S, Mason J, Miller A, Bailey M. Erays: reverse engineering Ethereums opaque smart contracts. In 27th USENIX security symposium (USENIX Security 18) 2018 (pp. 1371-85).
[28]Tug S, Meng W, Wang Y. CBSigIDS: towards collaborative blockchained signature-based intrusion detection. In international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData) 2018 (pp. 1228-35). IEEE.
[29]Wang W, Song J, Xu G, Li Y, Wang H, Su C. Contractward: automated vulnerability detection models for Ethereum smart contracts. IEEE Transactions on Network Science and Engineering. 2020; 8(2):1133-44.
[30]Zhang L, Wang J, Wang W, Jin Z, Zhao C, Cai Z, et al. A novel smart contract vulnerability detection method based on information graph and ensemble learning. Sensors. 2022; 22(9):1-25.
[31]Chen W, Zheng Z, Cui J, Ngai E, Zheng P, Zhou Y. Detecting ponzi schemes on Ethereum: towards healthier blockchain technology. In proceedings of the 2018 world wide web conference 2018 (pp. 1409-18).
[32]Chen W, Zheng Z, Ngai EC, Zheng P, Zhou Y. Exploiting blockchain data to detect smart ponzi schemes on Ethereum. IEEE Access. 2019; 7:37575-86.
[33]Jung E, Le TM, Gehani A, Ge Y. Data mining-based Ethereum fraud detection. In international conference on blockchain (Blockchain) 2019 (pp. 266-73). IEEE.
[34]Yan Z, Susilo W, Bertino E, Zhang J, Yang LT. AI-driven data security and privacy. Journal of Network and Computer Applications. 2020; 172:102842.
[35]Peng H, Li J, Wang S, Wang L, Gong Q, Yang R, et al. Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. IEEE Transactions on Knowledge and Data Engineering. 2019; 33(6):2505-19.
[36]Pham T, Lee S. Anomaly detection in bitcoin network using unsupervised learning methods. arXiv preprint arXiv:1611.03941. 2016.
[37]Bogner A. Seeing is understanding: anomaly detection in blockchains with visualized features. In proceedings of the international joint conference on pervasive and ubiquitous computing and proceedings of the international symposium on wearable computers 2017 (pp. 5-8). ACM.
[38]Aljofey A, Rasool A, Jiang Q, Qu Q. A feature-based robust method for abnormal contracts detection in Ethereum blockchain. Electronics. 2022; 11(18):1-24.
[39]Chen W, Li X, Sui Y, He N, Wang H, Wu L, et al. Sadponzi: detecting and characterizing ponzi schemes in Ethereum smart contracts. Proceedings of the ACM on Measurement and Analysis of Computing Systems. 2021; 5(2):1-30.
[40]Kamišalić A, Kramberger R, Fister JI. Synergy of blockchain technology and data mining techniques for anomaly detection. Applied Sciences. 2021; 11(17):1-37.
[41]Kumar N, Singh A, Handa A, Shukla SK. Detecting malicious accounts on the Ethereum blockchain with supervised learning. In cyber security cryptography and machine learning: fourth international symposium, Beer Sheva, Israel, proceedings 2020 (pp. 94-109). Springer International Publishing.
[42]Awang MK, Makhtar M, Udin N, Mansor NF. Improving customer churn classification with ensemble stacking method. International Journal of Advanced Computer Science and Applications. 2021; 12(11):277-85.
[43]Awang MK, Makhtar M, Mamat AR. Ensemble selection and combination based on cost function for UCI datasets. Journal of Theoretical and Applied Information Technology. 2021; 99(16):4015-25.
[44]Hisham S, Makhtar M, Aziz AA. Combining multiple classifiers using ensemble method for anomaly detection in blockchain networks: a comprehensive review. International Journal of Advanced Computer Science and Applications. 2022; 13(8):404-22.
[45]Baba NM, Makhtar M, Fadzli SA, Awang MK. Current issues in ensemble methods and its applications. Journal of Theoretical & Applied Information Technology. 2015; 81(2):266-76.
[46]Wang L, Cheng H, Zheng Z, Yang A, Zhu X. Ponzi scheme detection via oversampling-based long short-term memory for smart contracts. Knowledge-Based Systems. 2021; 228:107312.
[47]Lu P, Cai L, Yin K. SourceP: smart ponzi schemes detection on Ethereum using pre-training model with data flow. arXiv preprint arXiv:2306.01665. 2023.
[48]Zhang L, Chen W, Wang W, Jin Z, Zhao C, Cai Z, et al. Cbgru: a detection method of smart contract vulnerability based on a hybrid model. Sensors. 2022; 22(9):1-24.
[49]Durieux T, Ferreira JF, Abreu R, Cruz P. Empirical review of automated analysis tools on 47,587 Ethereum smart contracts. In proceedings of the 42nd international conference on software engineering 2020 (pp. 530-41). ACM/IEEE.
[50]Grieco G, Song W, Cygan A, Feist J, Groce A. Echidna: effective, usable, and fast fuzzing for smart contracts. In proceedings of the 29th SIGSOFT international symposium on software testing and analysis 2020 (pp. 557-60). ACM.
[51]Huang J, Zhou K, Xiong A, Li D. Smart contract vulnerability detection model based on multi-task learning. Sensors. 2022; 22(5):1-24.
[52]Ferreira JF, Cruz P, Durieux T, Abreu R. Smartbugs: a framework to analyze solidity smart contracts. In proceedings of the 35th international conference on automated software engineering 2020 (pp. 1349-52). IEEE/ACM.
[53]Fan S, Fu S, Xu H, Cheng X. Al-SPSD: anti-leakage smart ponzi schemes detection in blockchain. Information Processing & Management. 2021; 58(4):102587.
[54]Chen J, Xia X, Lo D, Grundy J, Luo X, Chen T. Defectchecker: automated smart contract defect detection by analyzing EVM bytecode. IEEE Transactions on Software Engineering. 2021; 48(7):2189-207.
[55]Vivar AL, Castedo AT, Orozco AL, Villalba LJ. An analysis of smart contracts security threats alongside existing solutions. Entropy. 2020; 22(2):1-29.
[56]Torres CF, Steichen M. The art of the scam: demystifying honeypots in Ethereum smart contracts. In 28th USENIX security symposium (USENIX Security 19) 2019 (pp. 1591-607).
[57]Sun X, Lin X, Liao Z. An ABI-based classification approach for Ethereum smart contracts. In international conference on dependable, autonomic and secure computing, international conference on pervasive intelligence and computing, international conference on cloud and big data computing, 2021 (pp. 99-104). IEEE.
[58]Asha J, Meenakowshalya A. Fake news detection using n-gram analysis and machine learning algorithms. Journal of Mobile Computing, Communications & Mobile Networks. 2021; 8(1):33-43.
[59]Aljofey A, Jiang Q, Rasool A, Chen H, Liu W, Qu Q, et al. An effective detection approach for phishing websites using URL and HTML features. Scientific Reports. 2022; 12(1):1-19.
[60]Zhao Z, Anand R, Wang M. Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In international conference on data science and advanced analytics (DSAA) 2019 (pp. 442-52). IEEE.
[61]Gollapalli M, Alansari A, Alkhorasani H, Alsubaii M, Sakloua R, Alzahrani R, et al. A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM. Computers in Biology and Medicine. 2022; 147:1-12.
[62]Farhana N, Firdaus A, Darmawan MF, Ab RMF. Evaluation of Boruta algorithm in DDoS detection. Egyptian Informatics Journal. 2023; 24(1):27-42.