ACCENTS Journals

Download PDF
Back

Paper Title	:	Convergence of batch gradient training-based smoothing L1regularization via adaptive momentum for feedforward neural networks
Author Name	:	Khidir Shaib Mohamed and Raed Muhammad Albadrani
Abstract	:	Momentum has been extensively researched for regularization, and it is a widely used method to quicken the convergence of practical training. Unfortunately, no such effective acceleration method for L1 regularization has yet to be presented. For the purpose of training and pruning feedforward neural networks, the convergence of batch gradient training-based smoothing L1 regularization via adaptive momentum (BGSL_1AM) was developed. In doing so, the usual L1 regularization is foremost nonsmooth; it generates oscillations in the computation and makes convergence analysis difficult. To overcome this issue, a smoothing function approximation L1 regularizer at the origin is proposed. Numerical simulation results on a range of function approximations and pattern classification problems demonstrate the effectiveness of BGTSL_1AM algorithm. This process progressively reduces weights pending training and allows for their removal afterwards. Together with the significance of the suggested approach and some weak and strong convergence analyses, the convergence conditions are also offered. The suggested learning method has good convergence qualities and accuracy in function approximation, as demonstrated by the simulation results.
Keywords	:	Convergence, Smoothing L1 regularization, Adaptive momentum, Feedforward neural network, Batch gradient training.
Cite this article	:	Mohamed KS, Albadrani RM.Convergence of batch gradient training-based smoothing L1regularization via adaptive momentum for feedforward neural networks. International Journal of Advanced Technology and Engineering Exploration. 2024;11(116):1005-1019. DOI:10.19101/IJATEE.2024.111100125
References	:	[1]Chen T, Lu W, Amari SI. Global convergence rate of recurrently connected neural networks. Neural Computation. 2002; 14(12):2947-57. [Crossref] [Google Scholar] [2]Haykin S. Neural networks: a comprehensive foundation. Prentice Hall PTR; 1998. [Google Scholar] [3]Magoulas GD, Vrahatis MN, Androulakis GS. Improving the convergence of the backpropagation algorithm using learning rate adaptation methods. Neural Computation. 1999; 11(7):1769-96. [Crossref] [Google Scholar] [4]Plaut DC. Experiments on learning by back propagation. Reports-Research. 1986. [Google Scholar] [5]Wilson DR, Martinez TR. The general inefficiency of batch training for gradient descent learning. Neural Networks. 2003; 16(10):1429-51. [Crossref] [Google Scholar] [6]Amari SI. Backpropagation and stochastic gradient descent method. Neurocomputing. 1993; 5(4-5):185-96. [Crossref] [Google Scholar] [7]Nakama T. Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing. 2009; 73(1-3):151-9. [Crossref] [Google Scholar] [8]Wu W, Shao H, Li Z. Convergence of batch BP algorithm with penalty for FNN training. In neural information processing: 13th international conference, ICONIP, Hong Kong, China. 2006 (pp. 562-9). Springer Berlin Heidelberg. [Crossref] [Google Scholar] [9]Zhang H, Wu W, Liu F, Yao M. Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Transactions on Neural Networks. 2009; 20(6):1050-4. [Crossref] [Google Scholar] [10]Ying X. An overview of overfitting and its solutions. In journal of physics: conference series 2019 (pp. 1-6). IOP Publishing. [Crossref] [Google Scholar] [11]Santos CF, Papa JP. Avoiding overfitting: a survey on regularization methods for convolutional neural networks. ACM Computing Surveys. 2022; 54(10):1-25. [Crossref] [Google Scholar] [12]He Z, Xie Z, Zhu Q, Qin Z. Sparse double descent: where network pruning aggravates overfitting. In international conference on machine learning 2022 (pp. 8635-59). PMLR. [Google Scholar] [13]Li M, Soltanolkotabi M, Oymak S. Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In international conference on artificial intelligence and statistics 2020 (pp. 4313-24). PMLR. [Google Scholar] [14]Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data. 2019; 6(1):1-48. [Crossref] [Google Scholar] [15]Sollich P, Krogh A. Learning with ensembles: how overfitting can be useful. Advances in Neural Information Processing Systems. 1995 :1-7. [Google Scholar] [16]Zhou L, Fan Q, Huang X, Liu Y. Weak and strong convergence analysis of Elman neural networks via weight decay regularization. Optimization. 2023; 72(9):2287-309. [Crossref] [Google Scholar] [17]Tian Y, Zhang Y. A comprehensive survey on regularization strategies in machine learning. Information Fusion. 2022; 80:146-66. [Crossref] [Google Scholar] [18]Weigend AS, Rumelhart DE, Huberman BA. Back-propagation, weight-elimination and time series prediction. In connectionist models 1991 (pp. 105-16). Morgan Kaufmann. [Crossref] [Google Scholar] [19]Moudafi A. On an extension of a spare regularization model. Mathematics. 2023; 11(20):1-12. [Crossref] [Google Scholar] [20]Vidaurre D, Bielza C, Larranaga P. A survey of L1 regression. International Statistical Review. 2013; 81(3):361-87. [Crossref] [Google Scholar] [21]Xu Z, Zhang H, Wang Y, Chang X, Liang Y. L 1/2 regularization. Science China Information Sciences. 2010; 53:1159-69. [Crossref] [Google Scholar] [22]Zhang J, Li Y, Zhao N, Zheng Z. L 0-regularization for high-dimensional regression with corrupted data. Communications in Statistics-Theory and Methods. 2024; 53(1):215-31. [Crossref] [Google Scholar] [23]Ming H, Yang H. L0 regularized logistic regression for large-scale data. Pattern Recognition. 2024; 146:110024. [Crossref] [Google Scholar] [24]Zhang H, Tang Y. Online gradient method with smoothing ℓ0 regularization for feedforward neural networks. Neurocomputing. 2017; 224:1-8. [Crossref] [Google Scholar] [25]Fan Q, Liu T. Smoothing l0 regularization for extreme learning machine. Mathematical Problems in Engineering. 2020; 2020(1):9175106. [Crossref] [Google Scholar] [26]Wu W, Fan Q, Zurada JM, Wang J, Yang D, Liu Y. Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks. Neural Networks. 2014; 50:72-8. [Crossref] [Google Scholar] [27]Li W, Chu M. A pruning feedforward small-world neural network by dynamic sparse regularization with smoothing l1/2 norm for nonlinear system modeling. Applied Soft Computing. 2023; 136:110133. [Crossref] [Google Scholar] [28]Li W, Li Z, Qiao J. A fast feedforward small-world neural network for nonlinear system modeling. IEEE Transactions on Neural Networks and Learning Systems. 2024. [Crossref] [Google Scholar] [29]Huang W, Li S, Fu X, Zhang C, Shi J, Zhu Z. Transient extraction based on minimax concave regularized sparse representation for gear fault diagnosis. Measurement. 2020; 151:107273. [Crossref] [Google Scholar] [30]Shi Y, Zhang Y, Zhang P, Xiao Y, Niu L. Federated learning with ℓ1 regularization. Pattern Recognition Letters. 2023; 172:15-21. [Crossref] [Google Scholar] [31]Tehrani JN, Mcewan A, Jin C, Van SA. L1 regularization method in electrical impedance tomography by using the L1-curve (pareto frontier curve). Applied Mathematical Modelling. 2012; 36(3):1095-105. [Crossref] [Google Scholar] [32]Yashwanth M, Nayak GK, Rangwani H, Singh A, Babu RV, Chakraborty A. Minimizing layerwise activation norm improves generalization in federated learning. In proceedings of the winter conference on applications of computer vision 2024 (pp. 2287-2296). IEEE. [Google Scholar] [33]Mohamed KS. Batch gradient learning algorithm with smoothing L 1 regularization for feedforward neural networks. Computers. 2022; 12(1):1-15. [Crossref] [Google Scholar] [34]Lillo WE, Loh MH, Hui S, Zak SH. On solving constrained optimization problems with neural networks: a penalty method approach. IEEE Transactions on Neural Networks. 1993; 4(6):931-40. [Crossref] [Google Scholar] [35]Setiono R. A penalty-function approach for pruning feedforward neural networks. Neural Computation. 1997; 9(1):185-204. [Crossref] [Google Scholar] [36]Fan Q, Peng J, Li H, Lin S. Convergence of a gradient-based learning algorithm with penalty for ridge polynomial neural networks. IEEE Access. 2020; 9:28742-52. [Crossref] [Google Scholar] [37]Hong-mei S, Wei W, Li-jun L. Convergence of online gradient method with penalty for BP neural networks. Communications in Mathematical Research. 2010; 26(1):67-75. [Google Scholar] [38]Attoh-okine NO. Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Advances in Engineering Software. 1999; 30(4):291-302. [Crossref] [Google Scholar] [39]Qian N. On the momentum term in gradient descent learning algorithms. Neural Networks. 1999; 12(1):145-51. [Crossref] [Google Scholar] [40]Gitman I, Lang H, Zhang P, Xiao L. Understanding the role of momentum in stochastic gradient methods. Advances in Neural Information Processing Systems. 2019; 32. [Google Scholar] [41]Hagiwara M, Sato A. Analysis of momentum term in back-propagation. IEICE Transactions on Information and Systems. 1995; 78(8):1080-6. [Google Scholar] [42]Choi J. Physical approach to price momentum and its application to momentum strategy. Physica A: Statistical Mechanics and its Applications. 2014; 415:61-72. [Crossref] [Google Scholar] [43]Kuhl D, Ramm E. Constraint energy momentum algorithm and its application to non-linear dynamics of shells. Computer Methods in Applied Mechanics and Engineering. 1996; 136(3-4):293-315. [Crossref] [Google Scholar] [44]Chang SY. Application of the momentum equations of motion to pseudo–dynamic testing. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences. 2001; 359(1786):1801-27. [Crossref] [Google Scholar] [45]Hameed AA, Karlik B, Salman MS. Back-propagation algorithm with variable adaptive momentum. Knowledge-Based Systems. 2016; 114:79-87. [Crossref] [Google Scholar] [46]Han X, Dong J. Applications of fractional gradient descent method with adaptive momentum in BP neural networks. Applied Mathematics and Computation. 2023; 448:127944. [Crossref] [Google Scholar] [47]Alkhairi P, Wanayumini W, Hayadi BH. Analysis of the adaptive learning rate and momentum effects on prediction problems in increasing the training time of the backpropagation algorithm. In AIP conference proceedings 2024. AIP Publishing. [Crossref] [Google Scholar] [48]Majda AJ, Stechmann SN. A simple dynamical model with features of convective momentum transport. Journal of the Atmospheric Sciences. 2009; 66(2):373-92. [Crossref] [Google Scholar] [49]Zhang H, Wu W, Yao M. Boundedness and convergence of batch back-propagation algorithm with penalty for feedforward neural networks. Neurocomputing. 2012; 89:141-6. [Crossref] [Google Scholar] [50]Zhang H, Xu D, Zhang Y. Boundedness and convergence of split-complex back-propagation algorithm with momentum and penalty. Neural Processing Letters. 2014; 39:297-307. [Crossref] [Google Scholar] [51]Fan Q, Wu W, Zurada JM. Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks. Springer Plus. 2016; 5:1-7. [Crossref] [Google Scholar] [52]Kang Q, Fan Q, Zurada JM. Deterministic convergence analysis via smoothing group lasso regularization and adaptive momentum for sigma-Pi-sigma neural network. Information Sciences. 2021; 553:66-82. [Crossref] [Google Scholar] [53]Fan Q, Liu L, Kang Q, Zhou L. Convergence of batch gradient method for training of pi-sigma neural network with regularizer and adaptive momentum term. Neural Processing Letters. 2023; 55(4):4871-88. [Crossref] [Google Scholar] [54]Wang L, Fu Z, Zhou Y, Yan Z. The implicit regularization of momentum gradient descent in overparametrized models. In proceedings of the AAAI conference on artificial intelligence 2023 (pp. 10149-56). [Crossref] [Google Scholar] [55]Jelassi S, Li Y. Towards understanding how momentum improves generalization in deep learning. In international conference on machine learning 2022 (pp. 9965-10040). PMLR. [Google Scholar]