Efficient implementation of artificial neural networks on FPGAs using high-level synthesis and parallelism
Mini K. Namboothiripad and Gayathri Vadhyan
Abstract
Artificial neural networks (ANNs) have gained significant attention for their ability to solve complex problems in various domains. However, the efficient implementation of ANN models on hardware remains challenging, particularly for systems requiring low power and high performance. Field programmable gate arrays (FPGAs) offer a promising solution due to their reconfigurability and parallel processing capabilities. This study explores the implementation of ANN on an FPGA using high level synthesis (HLS), focusing on optimizing performance by leveraging weight-level and node-level parallelism. Two methodologies were proposed for efficiently implementing ANN computations on an FPGA. The focus was on partitioning the computations of the ANN's first layer to the programmable logic (PL) of a system-on-chip (SoC) FPGA, while offloading the processing of subsequent layers to a 666 MHz, advanced reduced instruction set computer machine (ARM) processor. Six designs with varying levels of weight-level and node-level parallelism were implemented on a python based FPGA (PYNQ) board. Multiple processing elements (PEs) and sub-PEs were instantiated in the PL to extract parallelism from the ANN computations. Single-precision floating-point accuracy was used throughout the implementations. The custom digital design, operated at 150 MHz, achieved a significant speedup, demonstrating 2.5 times faster computation than the 666 MHz ARM processor for the entire ANN computation even with the limited resources available on the PYNQ board. Scaling up with multiple FPGAs could result in performance levels comparable to generic processors. The integration of HLS and the control block redesign capabilities of the ARM processor made the system adaptable to various applications without requiring extensive knowledge of hardware descriptive languages (HDL). This research shows that FPGA-based implementations of ANN, especially using HLS, offer a viable and efficient alternative to graphical processing unit (GPU) or processor-based designs for ANN applications. The demonstrated speedup achieved through parallelism and the use of PL indicates the potential of FPGAs in creating dedicated application-specific integrated circuits (ASICs), for ANN applications, offering a competitive option compared to traditional GPU or processor-based solutions.
Keyword
Artificial neural network, Field programmable gate arrays, High level synthesis, Node and weight level parallelism, Programmable logic.
Cite this article
Namboothiripad MK, Vadhyan G.Efficient implementation of artificial neural networks on FPGAs using high-level synthesis and parallelism. International Journal of Advanced Technology and Engineering Exploration. 2024;11(119):1497-1511. DOI:10.19101/IJATEE.2023.10102538
Refference
[1]Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Computer Science. 2021; 2(3):160.
[2]Mahesh B. Machine learning algorithms-a review. International Journal of Science and Research. 2020; 9(1):381-6.
[3]Hall W, Tian Y. Neural networks training on graphics processing unit (GPU) using dynamic parallelism (DP). In proceedings of SAI intelligent systems conference 2022 (pp. 811-8). Cham: Springer International Publishing.
[4]Jayanthi B, Kumar LS. Implementation of neural networks in FPGA. Indian Journal of Radio & Space Physics. 2021; 50(2).
[5]Luk W. Heterogeneous reconfigurable accelerators: trends and perspectives. In 60th ACM/IEEE design automation conference 2023 (pp. 1-2). IEEE.
[6]Boutros A, Nurvitadhi E, Ma R, Gribok S, Zhao Z, Hoe JC, et al. Beyond peak performance: comparing the real performance of AI-optimized FPGAs and GPUs. In international conference on field-programmable technology 2020 (pp. 10-9). IEEE.
[7]Hu Y, Liu Y, Liu Z. A survey on convolutional neural network accelerators: GPU, FPGA and ASIC. In 14th international conference on computer research and development 2022 (pp. 100-7). IEEE.
[8]Liu C. Yolov2 acceleration using embedded GPU and FPGAS: pros, cons, and a hybrid method. Evolutionary Intelligence. 2022; 15(4):2581-7.
[9]Boutros A, Betz V. FPGA architecture: principles and progression. IEEE Circuits and Systems Magazine. 2021; 21(2):4-29.
[10]Ersoy M, Kumral CD. Realization of artificial neural networks on FPGA. In artificial intelligence and applied mathematics in engineering problems: proceedings of the international conference on artificial intelligence and applied mathematics in engineering 2020 (pp. 418-28). Springer International Publishing.
[11]Ney J, Hammoud B, Dörner S, Herrmann M, Clausius J, Ten BS, et al. Efficient FPGA implementation of an ANN-based demapper using cross-layer analysis. Electronics. 2022; 11(7):1-22.
[12]Atibi M, Boussaa M, Bennis A, Atouf I. Real-time implementation of artificial neural network in FPGA platform. In embedded systems and artificial intelligence: proceedings of ESAI, Fez, Morocco 2020 (pp. 3-13). Springer Singapore.
[13]Vineetha KV, Reddy MM, Ramesh C, Kurup DG. An efficient design methodology to speed up the FPGA implementation of artificial neural networks. Engineering Science and Technology, an International Journal. 2023; 47:101542.
[14]Borhani A, Goharinejad MH, Zarandi HR. FAST: FPGA acceleration of neural networks training. In 12th international conference on computer and knowledge engineering 2022 (pp. 492-7). IEEE.
[15]https://fpgainsights.com/category/FPGA/. Accessed 24 August 2024.
[16]Jia X, Zhang Y, Liu G, Yang X, Zhang T, Zheng J, et al. XVDPU: a high-performance CNN accelerator on the versal platform powered by the AI engine. ACM Transactions on Reconfigurable Technology and Systems. 2024; 17(2):1-24.
[17]Carpegna A, Savino A, Di CS. Spiker: an FPGA-optimized hardware accelerator for spiking neural networks. In computer society annual symposium on VLSI 2022 (pp. 14-9). IEEE.
[18]Pham-quoc C, Nguyen XQ, Thinh TN. Towards an FPGA-targeted hardware/software co-design framework for CNN-based edge computing. Mobile Networks and Applications. 2022; 27(5):2024-35.
[19]Yawalkar PM, Kharat MU. Automatic handwritten character recognition of Devanagari language: a hybrid training algorithm for neural network. Evolutionary Intelligence. 2022; 15(2):1499-516.
[20]Al-rikabi H, Renczes B. Floating-point quantization analysis of multi-layer perceptron artificial neural networks. Journal of Signal Processing Systems. 2024:1-12.
[21]Ortega-zamorano F, Jerez JM, Gómez I, Franco L. Layer multiplexing FPGA implementation for deep back-propagation learning. Integrated Computer-Aided Engineering. 2017; 24(2):171-85.
[22]Saady MM, Essai MH. Hardware implementation of neural network-based engine model using FPGA. Alexandria Engineering Journal. 2022; 61(12):12039-50.
[23]Xie X, Wu C. WPU: a FPGA-based scalable, efficient and software/hardware co-design deep neural network inference acceleration processor. In international conference on high performance big data and intelligent systems 2021 (pp. 1-5). IEEE.
[24]Elsaid K, Safar M, El-kharashi MW. Optimized FPGA architecture for machine learning applications using posit multipliers. In international conference on microelectronics 2022 (pp. 50-3). IEEE.
[25]Elsaid K, El-kharashi MW, Safar M. An optimized FPGA architecture for machine learning applications. AEU-International Journal of Electronics and Communications. 2024; 174:155011.
[26]Nobari M, Jahanirad H. FPGA-based implementation of deep neural network using stochastic computing. Applied Soft Computing. 2023; 137:110166.
[27]Lingala SS, Bedekar S, Tyagi P, Saha P, Shahane P. FPGA based implementation of neural network. In international conference on advances in computing, communication and applied informatics 2022 (pp. 1-5). IEEE.
[28]Gdaim S, Mtibaa A, Mimouni MF. Artificial neural network-based DTC of an induction machine with experimental implementation on FPGA. Engineering Applications of Artificial Intelligence. 2023; 121:105972.
[29]Kim H. Review of optimal convolutional neural network accelerator platforms for mobile devices. Journal of Computing Science and Engineering. 2022; 16(2):113-9.
[30]Hong H, Choi D, Kim N, Lee H, Kang B, Kang H, et al. Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques. Journal of Real-Time Image Processing. 2024; 21(3):64.
[31]Kumbhar RR, Radhika P, Mane D. Design and optimization of an on-chip artificial neural network on FPGA for recognizing handwritten digits. In international conference on recent advances in electrical, electronics, ubiquitous communication, and computational intelligence 2023 (pp. 1-5). IEEE.
[32]Sowmya N, Kumar J, Biswal PK, Roy S, Pradhan S. Neuromorphic processor design and FPGA implementation for handwritten digits employing spiking neural network. International Journal of Computing and Digital Systems. 2023; 14(1):679-89.
[33]Khan MS, Yadav P, Verma R, Sreedevi I. FPGA simulation of fingertip digit recognition using CNN. In 7th international conference on signal processing and integrated networks 2020 (pp. 1072-7). IEEE.
[34]Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems. 2020; 32(2):604-24.
[35]Pramodhini R, Harakannanavar SS, Akshay CN, Rakshith N, Shrivastava R, Gupta A. Robust handwritten digit recognition system using hybrid artificial neural network on FPGA. In 2nd Mysore sub section international conference 2022 (pp. 1-5). IEEE.
[36]Shen G, Li J, Zhou Z, Chen X. FPGA-based neural network acceleration for handwritten digit recognition. In international conference on internet of things as a service 2020 (pp. 182-95). Cham: Springer International Publishing.
[37]Mittal H, Sharma A, Perumal T. FPGA implementation of handwritten number recognition using artificial neural network. In 8th global conference on consumer electronics 2019 (pp. 1010-1). IEEE.
[38]Cong J, Lau J, Liu G, Neuendorffer S, Pan P, Vissers K, et al. FPGA HLS today: successes, challenges, and opportunities. ACM Transactions on Reconfigurable Technology and Systems. 2022; 15(4):1-42.
[39]Abdelhamid RB, Kuwazawa G, Yamaguchi Y. Quantitative study of floating-point precision on modern FPGAs. In proceedings of the 13th international symposium on highly efficient accelerators and reconfigurable technologies 2023 (pp. 49-58). ACM.
[40]Keilbart C, Gao Y, Chua M, Matthews E, Wilton SJ, Shannon L. Designing an IEEE-compliant FPU that supports configurable precision for soft processors. ACM Transactions on Reconfigurable Technology and Systems. 2024; 17(2):1-32.
[41]https://www.tensorflow.org/datasets/catalog/mnist. Accessed 24 August 2024.
[42]https://www.kaggle.com/code/benroshan/digit-fashion-mnist-ann/notebook. Accessed 24 August 2024.
[43]https://docs.amd.com/v/u/en-US/ug902-vivado-high-level-synthesis. Accessed 24 August 2024.
[44]https://www.xilinx.com/support/university/xup-boards/XUPPYNQ-Z2.html. Accessed 24 August 2024.
[45]Namboothiripad MK, Datar MJ, Chandorkar MC, Patkar SB. Accelerator for real-time emulation of modular-multilevel-converter using FPGA. In 21st workshop on control and modeling for power electronics 2020 (pp. 1-7). IEEE.
[46]Namboothiripad MK, Datar MJ, Chandorkar MC, Patkar SB. FPGA accelerator for real-time emulation of power electronic systems using multiport decomposition. IEEE Transactions on Industry Applications. 2020; 56(6):6674-86.
[47]https://www.avnet.com/wps/portal/apac/products/c/xilinx-pynq. Accessed 24 August 2024.