ACCENTS Journals

Download PDF
Back

Paper Title	:	Efficient implementation of artificial neural networks on FPGAs using high-level synthesis and parallelism
Author Name	:	Mini K. Namboothiripad and Gayathri Vadhyan
Abstract	:	Artificial neural networks (ANNs) have gained significant attention for their ability to solve complex problems in various domains. However, the efficient implementation of ANN models on hardware remains challenging, particularly for systems requiring low power and high performance. Field programmable gate arrays (FPGAs) offer a promising solution due to their reconfigurability and parallel processing capabilities. This study explores the implementation of ANN on an FPGA using high level synthesis (HLS), focusing on optimizing performance by leveraging weight-level and node-level parallelism. Two methodologies were proposed for efficiently implementing ANN computations on an FPGA. The focus was on partitioning the computations of the ANN's first layer to the programmable logic (PL) of a system-on-chip (SoC) FPGA, while offloading the processing of subsequent layers to a 666 MHz, advanced reduced instruction set computer machine (ARM) processor. Six designs with varying levels of weight-level and node-level parallelism were implemented on a python based FPGA (PYNQ) board. Multiple processing elements (PEs) and sub-PEs were instantiated in the PL to extract parallelism from the ANN computations. Single-precision floating-point accuracy was used throughout the implementations. The custom digital design, operated at 150 MHz, achieved a significant speedup, demonstrating 2.5 times faster computation than the 666 MHz ARM processor for the entire ANN computation even with the limited resources available on the PYNQ board. Scaling up with multiple FPGAs could result in performance levels comparable to generic processors. The integration of HLS and the control block redesign capabilities of the ARM processor made the system adaptable to various applications without requiring extensive knowledge of hardware descriptive languages (HDL). This research shows that FPGA-based implementations of ANN, especially using HLS, offer a viable and efficient alternative to graphical processing unit (GPU) or processor-based designs for ANN applications. The demonstrated speedup achieved through parallelism and the use of PL indicates the potential of FPGAs in creating dedicated application-specific integrated circuits (ASICs), for ANN applications, offering a competitive option compared to traditional GPU or processor-based solutions.
Keywords	:	Artificial neural network, Field programmable gate arrays, High level synthesis, Node and weight level parallelism, Programmable logic.
Cite this article	:	Namboothiripad MK, Vadhyan G.Efficient implementation of artificial neural networks on FPGAs using high-level synthesis and parallelism. International Journal of Advanced Technology and Engineering Exploration. 2024;11(119):1497-1511. DOI:10.19101/IJATEE.2023.10102538
References	:	[1]Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Computer Science. 2021; 2(3):160. [Crossref] [Google Scholar] [2]Mahesh B. Machine learning algorithms-a review. International Journal of Science and Research. 2020; 9(1):381-6. [Google Scholar] [3]Hall W, Tian Y. Neural networks training on graphics processing unit (GPU) using dynamic parallelism (DP). In proceedings of SAI intelligent systems conference 2022 (pp. 811-8). Cham: Springer International Publishing. [Crossref] [Google Scholar] [4]Jayanthi B, Kumar LS. Implementation of neural networks in FPGA. Indian Journal of Radio & Space Physics. 2021; 50(2). [Google Scholar] [5]Luk W. Heterogeneous reconfigurable accelerators: trends and perspectives. In 60th ACM/IEEE design automation conference 2023 (pp. 1-2). IEEE. [Crossref] [Google Scholar] [6]Boutros A, Nurvitadhi E, Ma R, Gribok S, Zhao Z, Hoe JC, et al. Beyond peak performance: comparing the real performance of AI-optimized FPGAs and GPUs. In international conference on field-programmable technology 2020 (pp. 10-9). IEEE. [Crossref] [Google Scholar] [7]Hu Y, Liu Y, Liu Z. A survey on convolutional neural network accelerators: GPU, FPGA and ASIC. In 14th international conference on computer research and development 2022 (pp. 100-7). IEEE. [Crossref] [Google Scholar] [8]Liu C. Yolov2 acceleration using embedded GPU and FPGAS: pros, cons, and a hybrid method. Evolutionary Intelligence. 2022; 15(4):2581-7. [Crossref] [Google Scholar] [9]Boutros A, Betz V. FPGA architecture: principles and progression. IEEE Circuits and Systems Magazine. 2021; 21(2):4-29. [Crossref] [Google Scholar] [10]Ersoy M, Kumral CD. Realization of artificial neural networks on FPGA. In artificial intelligence and applied mathematics in engineering problems: proceedings of the international conference on artificial intelligence and applied mathematics in engineering 2020 (pp. 418-28). Springer International Publishing. [Crossref] [Google Scholar] [11]Ney J, Hammoud B, Dörner S, Herrmann M, Clausius J, Ten BS, et al. Efficient FPGA implementation of an ANN-based demapper using cross-layer analysis. Electronics. 2022; 11(7):1-22. [Crossref] [Google Scholar] [12]Atibi M, Boussaa M, Bennis A, Atouf I. Real-time implementation of artificial neural network in FPGA platform. In embedded systems and artificial intelligence: proceedings of ESAI, Fez, Morocco 2020 (pp. 3-13). Springer Singapore. [Crossref] [Google Scholar] [13]Vineetha KV, Reddy MM, Ramesh C, Kurup DG. An efficient design methodology to speed up the FPGA implementation of artificial neural networks. Engineering Science and Technology, an International Journal. 2023; 47:101542. [Crossref] [Google Scholar] [14]Borhani A, Goharinejad MH, Zarandi HR. FAST: FPGA acceleration of neural networks training. In 12th international conference on computer and knowledge engineering 2022 (pp. 492-7). IEEE. [Crossref] [Google Scholar] [15]https://fpgainsights.com/category/FPGA/. Accessed 24 August 2024. [16]Jia X, Zhang Y, Liu G, Yang X, Zhang T, Zheng J, et al. XVDPU: a high-performance CNN accelerator on the versal platform powered by the AI engine. ACM Transactions on Reconfigurable Technology and Systems. 2024; 17(2):1-24. [Crossref] [Google Scholar] [17]Carpegna A, Savino A, Di CS. Spiker: an FPGA-optimized hardware accelerator for spiking neural networks. In computer society annual symposium on VLSI 2022 (pp. 14-9). IEEE. [Crossref] [Google Scholar] [18]Pham-quoc C, Nguyen XQ, Thinh TN. Towards an FPGA-targeted hardware/software co-design framework for CNN-based edge computing. Mobile Networks and Applications. 2022; 27(5):2024-35. [Crossref] [Google Scholar] [19]Yawalkar PM, Kharat MU. Automatic handwritten character recognition of Devanagari language: a hybrid training algorithm for neural network. Evolutionary Intelligence. 2022; 15(2):1499-516. [Crossref] [Google Scholar] [20]Al-rikabi H, Renczes B. Floating-point quantization analysis of multi-layer perceptron artificial neural networks. Journal of Signal Processing Systems. 2024:1-12. [Crossref] [Google Scholar] [21]Ortega-zamorano F, Jerez JM, Gómez I, Franco L. Layer multiplexing FPGA implementation for deep back-propagation learning. Integrated Computer-Aided Engineering. 2017; 24(2):171-85. [Crossref] [Google Scholar] [22]Saady MM, Essai MH. Hardware implementation of neural network-based engine model using FPGA. Alexandria Engineering Journal. 2022; 61(12):12039-50. [Crossref] [Google Scholar] [23]Xie X, Wu C. WPU: a FPGA-based scalable, efficient and software/hardware co-design deep neural network inference acceleration processor. In international conference on high performance big data and intelligent systems 2021 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [24]Elsaid K, Safar M, El-kharashi MW. Optimized FPGA architecture for machine learning applications using posit multipliers. In international conference on microelectronics 2022 (pp. 50-3). IEEE. [Crossref] [Google Scholar] [25]Elsaid K, El-kharashi MW, Safar M. An optimized FPGA architecture for machine learning applications. AEU-International Journal of Electronics and Communications. 2024; 174:155011. [Crossref] [Google Scholar] [26]Nobari M, Jahanirad H. FPGA-based implementation of deep neural network using stochastic computing. Applied Soft Computing. 2023; 137:110166. [Crossref] [Google Scholar] [27]Lingala SS, Bedekar S, Tyagi P, Saha P, Shahane P. FPGA based implementation of neural network. In international conference on advances in computing, communication and applied informatics 2022 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [28]Gdaim S, Mtibaa A, Mimouni MF. Artificial neural network-based DTC of an induction machine with experimental implementation on FPGA. Engineering Applications of Artificial Intelligence. 2023; 121:105972. [Crossref] [Google Scholar] [29]Kim H. Review of optimal convolutional neural network accelerator platforms for mobile devices. Journal of Computing Science and Engineering. 2022; 16(2):113-9. [Crossref] [Google Scholar] [30]Hong H, Choi D, Kim N, Lee H, Kang B, Kang H, et al. Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques. Journal of Real-Time Image Processing. 2024; 21(3):64. [Crossref] [Google Scholar] [31]Kumbhar RR, Radhika P, Mane D. Design and optimization of an on-chip artificial neural network on FPGA for recognizing handwritten digits. In international conference on recent advances in electrical, electronics, ubiquitous communication, and computational intelligence 2023 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [32]Sowmya N, Kumar J, Biswal PK, Roy S, Pradhan S. Neuromorphic processor design and FPGA implementation for handwritten digits employing spiking neural network. International Journal of Computing and Digital Systems. 2023; 14(1):679-89. [Crossref] [Google Scholar] [33]Khan MS, Yadav P, Verma R, Sreedevi I. FPGA simulation of fingertip digit recognition using CNN. In 7th international conference on signal processing and integrated networks 2020 (pp. 1072-7). IEEE. [Crossref] [Google Scholar] [34]Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems. 2020; 32(2):604-24. [Crossref] [Google Scholar] [35]Pramodhini R, Harakannanavar SS, Akshay CN, Rakshith N, Shrivastava R, Gupta A. Robust handwritten digit recognition system using hybrid artificial neural network on FPGA. In 2nd Mysore sub section international conference 2022 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [36]Shen G, Li J, Zhou Z, Chen X. FPGA-based neural network acceleration for handwritten digit recognition. In international conference on internet of things as a service 2020 (pp. 182-95). Cham: Springer International Publishing. [Crossref] [Google Scholar] [37]Mittal H, Sharma A, Perumal T. FPGA implementation of handwritten number recognition using artificial neural network. In 8th global conference on consumer electronics 2019 (pp. 1010-1). IEEE. [Crossref] [Google Scholar] [38]Cong J, Lau J, Liu G, Neuendorffer S, Pan P, Vissers K, et al. FPGA HLS today: successes, challenges, and opportunities. ACM Transactions on Reconfigurable Technology and Systems. 2022; 15(4):1-42. [Crossref] [Google Scholar] [39]Abdelhamid RB, Kuwazawa G, Yamaguchi Y. Quantitative study of floating-point precision on modern FPGAs. In proceedings of the 13th international symposium on highly efficient accelerators and reconfigurable technologies 2023 (pp. 49-58). ACM. [Crossref] [Google Scholar] [40]Keilbart C, Gao Y, Chua M, Matthews E, Wilton SJ, Shannon L. Designing an IEEE-compliant FPU that supports configurable precision for soft processors. ACM Transactions on Reconfigurable Technology and Systems. 2024; 17(2):1-32. [Crossref] [Google Scholar] [41]https://www.tensorflow.org/datasets/catalog/mnist. Accessed 24 August 2024. [42]https://www.kaggle.com/code/benroshan/digit-fashion-mnist-ann/notebook. Accessed 24 August 2024. [43]https://docs.amd.com/v/u/en-US/ug902-vivado-high-level-synthesis. Accessed 24 August 2024. [44]https://www.xilinx.com/support/university/xup-boards/XUPPYNQ-Z2.html. Accessed 24 August 2024. [45]Namboothiripad MK, Datar MJ, Chandorkar MC, Patkar SB. Accelerator for real-time emulation of modular-multilevel-converter using FPGA. In 21st workshop on control and modeling for power electronics 2020 (pp. 1-7). IEEE. [Crossref] [Google Scholar] [46]Namboothiripad MK, Datar MJ, Chandorkar MC, Patkar SB. FPGA accelerator for real-time emulation of power electronic systems using multiport decomposition. IEEE Transactions on Industry Applications. 2020; 56(6):6674-86. [Crossref] [Google Scholar] [47]https://www.avnet.com/wps/portal/apac/products/c/xilinx-pynq. Accessed 24 August 2024.