CLC number:
On-line Access: 2024-01-18
Received: 2023-06-01
Revision Accepted: 2023-11-16
Crosschecked: 0000-00-00
Cited: 0
Clicked: 186
Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems",
author="Zhenyi ZHANG, Jie HUANG, Congjie PAN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300394"
}
%0 Journal Article
%T Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
%A Zhenyi ZHANG
%A Jie HUANG
%A Congjie PAN
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300394
TY - JOUR
T1 - Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
A1 - Zhenyi ZHANG
A1 - Jie HUANG
A1 - Congjie PAN
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300394
Abstract: reinforcement learning behavioral control (RLBC) is limited to individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this research, a novel multi-agent reinforcement learning behavioral control (MARLBC) is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers (SORLC) is designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.
Open peer comments: Debate/Discuss/Question/Opinion
<1>