Affiliation(s):
College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China;
moreAffiliation(s): College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China; 5G+ Industrial Internet Institute, Fuzhou University, Fuzhou 350108, China;
less
Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300394
@article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems", author="Zhenyi ZHANG, Jie HUANG, Congjie PAN", journal="Frontiers of Information Technology & Electronic Engineering", year="in press", publisher="Zhejiang University Press & Springer", doi="https://doi.org/10.1631/FITEE.2300394" }
%0 Journal Article %T Multi-agent reinforcement learning behavioral control for nonlinear second-order systems %A Zhenyi ZHANG %A Jie HUANG %A Congjie PAN %J Frontiers of Information Technology & Electronic Engineering %P %@ 2095-9184 %D in press %I Zhejiang University Press & Springer doi="https://doi.org/10.1631/FITEE.2300394"
TY - JOUR T1 - Multi-agent reinforcement learning behavioral control for nonlinear second-order systems A1 - Zhenyi ZHANG A1 - Jie HUANG A1 - Congjie PAN J0 - Frontiers of Information Technology & Electronic Engineering SP - EP - %@ 2095-9184 Y1 - in press PB - Zhejiang University Press & Springer ER - doi="https://doi.org/10.1631/FITEE.2300394"
Abstract: Reinforcement learning behavioral control (RLBC) is limited to individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this research, a novel multi-agent reinforcement learning behavioral control (MARLBC) is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers (SORLC) is designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.
Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference
Open peer comments: Debate/Discuss/Question/Opinion
Open peer comments: Debate/Discuss/Question/Opinion
<1>