ACCENTS Journals

Download PDF
Back

Paper Title	:	Quantum prioritized experience replay with MaDi-based priority and quantum circuit mechanisms for optimizing reinforcement learning
Author Name	:	R. Palanivel and P. Muthulakshmi
Abstract	:	Reinforcement learning (RL) encounters significant challenges related to scalability and computational efficiency, particularly in complex decision-making environments. Traditional RL algorithms often struggle with large-scale tasks, necessitating innovative approaches to enhance adaptability and performance. The quantum circuit-based priority replay (QCPR) algorithm was introduced in this study, leveraging quantum-inspired techniques to enhance learning efficiency and decision-making quality through quantum computing (QC). The QCPR algorithm implements two key mechanisms for prioritizing experiences: the magnitude and direction (MaDi) based priority, which assesses the significance and directionality of experiences, and QCPR-specific methods that ensure efficient experience replay. These mechanisms are seamlessly integrated into the Q-learning (QL) framework to optimize the RL process for superior performance. QCPR was evaluated against four well-established RL algorithms-QL, deep Q-networks (DQN), prioritized experience replay (PER), and dueling DQN (DDQN)-using standard decision-making benchmarks. The algorithm was further tested in various simulation and real-world environments, including Atari games, Qiskit integration, and Raspberry Pi hardware and software setups. These evaluations demonstrated QCPR’s adaptability and robustness, showcasing its capability for dynamic, large-scale applications. The results revealed that QCPR significantly outperforms its counterparts across key performance metrics, including accuracy, precision, recall, F1-score, and cumulative rewards. Specifically, QCPR achieved a 15.29% improvement in accuracy over QL, a 9.98% increase over DQN, a 6.33% improvement over PER, and an 8.23% enhancement over DDQN. This study highlights the potential of quantum-inspired approaches to advance RL, offering scalable and efficient solutions for complex decision-making tasks.
Keywords	:	Quantum computing, MaDi priority, Quantum circuit-based priority replay, Quantum reinforcement learning, Quantum priority experience reply.
Cite this article	:	Palanivel R, Muthulakshmi P.Quantum prioritized experience replay with MaDi-based priority and quantum circuit mechanisms for optimizing reinforcement learning. International Journal of Advanced Technology and Engineering Exploration. 2024;11(121):1664-1680. DOI:10.19101/IJATEE.2024.111100094
References	:	[1]Nakabi TA, Toivanen P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustainable Energy, Grids and Networks. 2021; 25:100413. [Crossref] [Google Scholar] [2]Dulac-arnold G, Levine N, Mankowitz DJ, Li J, Paduraru C, Gowal S, et al. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning. 2021; 110(9):2419-68. [Crossref] [Google Scholar] [3]Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S. Quantum machine learning. Nature. 2017; 549(7671):195-202. [Crossref] [Google Scholar] [4]Skolik A, Jerbi S, Dunjko V. Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum. 2022; 6:720. [Crossref] [Google Scholar] [5]Nielsen MA, Chuang IL. Quantum computation and quantum information. Cambridge University Press; 2010. [Google Scholar] [6]Padakandla S. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Computing Surveys. 2021; 54(6):1-25. [Crossref] [Google Scholar] [7]Xiao H, Chen X, Xu J. Using a deep quantum neural network to enhance the fidelity of quantum convolutional codes. Applied Sciences. 2022; 12(11):1-11. [Crossref] [Google Scholar] [8]Saggio V, Asenbeck BE, Hamann A, Strömberg T, Schiansky P, Dunjko V, et al. Experimental quantum speed-up in reinforcement learning agents. Nature. 2021; 591(7849):229-33. [Crossref] [Google Scholar] [9]Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature. 2015; 518(7540):529-33. [Crossref] [Google Scholar] [10]Preskill J. Quantum computing in the NISQ era and beyond. Quantum. 2018; 2:79. [Crossref] [Google Scholar] [11]Schuld M, Petruccione F, Schuld M, Petruccione F. Quantum models as kernel methods. Machine Learning with Quantum Computers. 2021:217-45. [Crossref] [Google Scholar] [12]Wei Q, Ma H, Chen C, Dong D. Deep reinforcement learning with quantum-inspired experience replay. IEEE Transactions on Cybernetics. 2021; 52(9):9326-38. [Crossref] [Google Scholar] [13]Gonçalves CP. Quantum robotics, neural networks and the quantum force interpretation. Neuro Quantology. 2019; 17(2):33-55. [Crossref] [Google Scholar] [14]Jiang H, Gui R, Chen Z, Wu L, Dang J, Zhou J. An improved sarsa (lambda ) reinforcement learning algorithm for wireless communication systems. IEEE Access. 2019; 7:115418-27. [Crossref] [Google Scholar] [15]Baritompa WP, Bulger DW, Wood GR. Grovers quantum algorithm applied to global optimization. SIAM Journal on Optimization. 2005; 15(4):1170-84. [Crossref] [Google Scholar] [16]Andres E, Cuéllar MP, Navarro G. On the use of quantum reinforcement learning in energy-efficiency scenarios. Energies. 2022; 15(16):1-24. [Crossref] [Google Scholar] [17]Fährmann D, Jorek N, Damer N, Kirchbuchner F, Kuijper A. Double deep q-learning with prioritized experience replay for anomaly detection in smart environments. IEEE Access. 2022; 10:60836-48. [Crossref] [Google Scholar] [18]Miyajima H, Shigei N, Makino S, Miyajima H, Miyanishi Y, Kitagami S, et al. A proposal of privacy preserving reinforcement learning for secure multiparty computation. Artificial Intelligence Research. 2017; 6(2):57-68. [Crossref] [Google Scholar] [19]Lin LJ. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning. 1992; 8:293-321. [Crossref] [Google Scholar] [20]Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. 4th International Conference on Learning Representations 2016(pp. 1-21). ICLR. [Crossref] [Google Scholar] [21]Sannia A, Giordano A, Gullo NL, Mastroianni C, Plastina F. A hybrid classical-quantum approach to speed-up Q-learning. Scientific Reports. 2023; 13(1):1-10. [Crossref] [Google Scholar] [22]Skolik A, Mangini S, Bäck T, Macchiavello C, Dunjko V. Robustness of quantum reinforcement learning under hardware errors. EPJ Quantum Technology. 2023; 10(1):1-43. [Crossref] [Google Scholar] [23]Chen SY. Asynchronous training of quantum reinforcement learning. Procedia Computer Science. 2023; 222:321-30. [Crossref] [Google Scholar] [24]Li Z, Zhou Y, Liu Y, Zhu F, Yang C, Hu S. QAP: a quantum-inspired adaptive-priority-learning model for multimodal emotion recognition. In findings of the association for computational linguistics: ACL 2023 (pp. 12191-204). Association for Computational Linguistics. [Crossref] [Google Scholar] [25]Tian J, Sun X, Du Y, Zhao S, Liu Q, Zhang K, et al. Recent advances for quantum neural networks in generative learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023; 45(10):12321-40. [Crossref] [Google Scholar] [26]Chen SY. Quantum deep Q-learning with distributed prioritized experience replay. In international conference on quantum computing and engineering (QCE) 2023 (pp. 31-5). IEEE. [Crossref] [Google Scholar] [27]Dong D, Chen C, Li H, Tarn TJ. Quantum reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2008; 38(5):1207-20. [Crossref] [Google Scholar] [28]Lamata L. Quantum machine learning implementations: proposals and experiments. Advanced Quantum Technologies. 2023; 6(7):1-7. [Crossref] [Google Scholar] [29]Metz F, Bukov M. Self-correcting quantum many-body control using reinforcement learning with tensor networks. Nature Machine Intelligence. 2023; 5(7):780-91. [Crossref] [Google Scholar] [30]Chen SY. Quantum deep recurrent reinforcement learning. In international conference on acoustics, speech and signal processing (ICASSP) 2023 (pp. 1-5). IEEE. [Crossref] [Google Scholar] [31] Awad M, Fraihat S. Recursive feature elimination with cross-validation with decision tree: feature selection method for machine learning-based intrusion detection systems. Journal of Sensor and Actuator Networks. 2023; 12(5):1-23. [Crossref] [Google Scholar] [32]Sharma NK, Kumar S, Yadav PK. Enhancing infrastructure sustainability: reliability and sensitivity analysis of localized integrated renewable energy systems using feed forward backpropagation neural network. International Journal of Advanced Technology and Engineering Exploration. 2024; 11(110):58-75. [Crossref] [Google Scholar] [33]Ghintab SS, Hassan MY. Localization for self-driving vehicles based on deep learning networks and RGB cameras. International Journal of Advanced Technology and Engineering Exploration. 2023; 10(105):1016-36. [Crossref] [Google Scholar] [34]Li H, Qian X, Song W. Prioritized experience replay based on dynamics priority. Scientific Reports. 2024; 14(1):1-9. [Crossref] [Google Scholar] [35]Wang H, Xiang H. Quantum optimization algorithm based on multistep quantum computation. New Journal of Physics. 2024; 26(7):1-16. [Crossref] [Google Scholar] [36]Roik J, Bartkiewicz K, Černoch A, Lemr K. Routing in quantum communication networks using reinforcement machine learning. Quantum Information Processing. 2024; 23(3):89. [Crossref] [Google Scholar] [37]Vandelli M, Lignarolo A, Cavazzoni C, Dragoni D. Evaluating the practicality of quantum optimization algorithms for prototypical industrial applications. Quantum Information Processing. 2024; 23(10):1-14. [Crossref] [Google Scholar] [38]Patil S, Banerjee S, Panigrahi PK. NISQ-friendly measurement-based quantum clustering algorithms. Quantum Information Processing. 2024; 23(10):341. [Crossref] [Google Scholar] [39]Zhao LY, Chang TQ, Zhang L, Zhang J, Chu KX, Kong DP. Targeted multi-agent communication algorithm based on state control. Defence Technology. 2022; 31: 544-56. [Crossref] [Google Scholar] [40]Palanivel R, Muthulakshmi P. Error mitigation using quantum neural Q network in secure qutrit distribution on cleves protocol on quantum computing. Quantum Information Processing. 2024; 23(4):1-30. [Crossref] [Google Scholar] [41]Mangla C, Rani S, Abdelsalam A. QLSN: quantum key distribution for large scale networks. Information and Software Technology. 2024; 165:107349. [Crossref] [Google Scholar] [42]Kundu A, Bedełek P, Ostaszewski M, Danaci O, Patel YJ, Dunjko V, et al. Enhancing variational quantum state diagonalization using reinforcement learning techniques. New Journal of Physics. 2024; 26(1):1-18. [Crossref] [Google Scholar] [43]Yun WJ, Kim JP, Jung S, Kim JH, Kim J. Quantum multiagent actor–critic neural networks for internet-connected multirobot coordination in smart factory management. IEEE Internet of Things Journal. 2023; 10(11):9942-52. [Crossref] [Google Scholar] [44]Peral-garcía D, Cruz-benito J, García-peñalvo FJ. Systematic literature review: quantum machine learning and its applications. Computer Science Review. 2024; 51:1-20. [Crossref] [Google Scholar] [45]Hellstern G, Dehn V, Zaefferer M. Quantum computer based feature selection in machine learning. IET Quantum Communication. 2024; 5(3):232-52. [Crossref] [Google Scholar] [46]Palanivel R, Muthulakshmi P. Design and analysis of quantum transfer fractal priority replay and mirdad priority loss algorithms for quantum reinforcement learning. In international conference on data management, analytics & innovation 2024 (pp. 409-24). Singapore: Springer Nature Singapore. [Crossref] [Google Scholar] [47]Palanivel R, Muthulakshmi P. Design and analysis of parallel quantum transfer fractal priority replay with dynamic memory algorithm in quantum reinforcement learning for robotics. IET Quantum Communication. 2024:1-24. [Crossref] [Google Scholar] [48]Van SH, Van HH, Whiteson S, Wiering M. A theoretical and empirical analysis of expected Sarsa. In symposium on adaptive dynamic programming and reinforcement learning 2009 (pp. 177-84). IEEE. [Crossref] [Google Scholar] [49]Dong D, Chen C, Li H, Tarn TJ. Quantum reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2008; 38(5):1207-20. [Crossref] [Google Scholar] [50]Barnard E. Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics. 1993; 23(2):357-65. [Crossref] [Google Scholar] [51]Yao J, Bukov M, Lin L. Policy gradient based quantum approximate optimization algorithm. In mathematical and scientific machine learning 2020 (pp. 605-34). PMLR. [Google Scholar] [52]Semola R, Moro L, Bacciu D, Prati E. Deep reinforcement learning quantum control on IBMQ platforms and Qiskit pulse. In international conference on quantum computing and engineering (QCE) 2022 (pp. 759-62). IEEE. [Crossref] [Google Scholar] [53]Dalla PN, Buffoni L, Martina S, Caruso F. Quantum reinforcement learning: the maze problem. Quantum Machine Intelligence. 2022; 4(1):1-10. [Crossref] [Google Scholar] [54]Schuld M, Killoran N. Quantum machine learning in feature Hilbert spaces. Physical Review Letters. 2019; 122(4):040504. [Crossref] [Google Scholar] [55]Mitarai K, Negoro M, Kitagawa M, Fujii K. Quantum circuit learning. Physical Review A. 2018; 98(3):1-6. [Crossref] [Google Scholar] [56]Chintala P, Dornberger R, Hanne T. Robotic path planning by Q learning and a performance comparison with classical path finding algorithms. International Journal of Mechanical Engineering and Robotics Research. 2022; 11(6):373-8. [Crossref] [Google Scholar] [57]Zhou X, Feng Y, Li S. A monte carlo tree search framework for quantum circuit transformation. In proceedings of the 39th international conference on computer-aided design 2020 (pp. 1-7). ACM. [Crossref] [Google Scholar] [58]Zhao C, Gao XS. QDNN: deep neural networks with quantum layers. Quantum Machine Intelligence. 2021; 3(1):1-9. [Crossref] [Google Scholar] [59]Payares ED, Martínez-santos JC. Parallel quantum computation approach for quantum deep learning and classical-quantum models. In journal of physics: conference series 2021 (pp. 1-11). IOP Publishing. [Crossref] [Google Scholar] [60]Hu W, Hu J. Q learning with quantum neural networks. Natural Science. 2019; 11(1):31-9. [Crossref] [Google Scholar] [61]Fang P, Zhang C, Situ H. Quantum state clustering algorithm based on variational quantum circuit. Quantum Information Processing. 2024; 23(4):125. [Crossref] [Google Scholar] [62]Uehara GS, Spanias A, Clark W. Quantum information processing algorithms with emphasis on machine learning. In 12th international conference on information, intelligence, systems & applications (IISA) 2021 (pp. 1-11). IEEE. [Crossref] [Google Scholar] [63]Yan R, Wang Y, Xu Y, Dai J. A multiagent quantum deep reinforcement learning method for distributed frequency control of islanded microgrids. IEEE Transactions on Control of Network Systems. 2022; 9(4):1622-32. [Crossref] [Google Scholar] [64]Cleve R. An introduction to quantum complexity theory. Collected Papers on Quantum Computation and Quantum Information Theory. 2000:103-27. [Crossref] [Google Scholar] [65]Peng X, Chen R, Zhang J, Chen B, Tseng HW, Wu TL, et al. Enhanced autonomous navigation of robots by deep reinforcement learning algorithm with multistep method. Sensors & Materials. 2021; 33(2):825-42. [Crossref] [Google Scholar] [66]Dong D, Chen C, Chu J, Tarn TJ. Robust quantum-inspired reinforcement learning for robot navigation. IEEE/ASME Transactions on Mechatronics. 2010; 17(1):86-97. [Crossref] [Google Scholar]