CLC number: TP273
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-04-21
Cited: 0
Clicked: 2750
Citations: Bibtex RefMan EndNote GB/T7714
Yu SHI, Yongzhao HUA, Jianglong YU, Xiwang DONG, Zhang REN. Multi-agent differential game based cooperative synchronization control using a data-driven method[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(7): 1043-1056.
@article{title="Multi-agent differential game based cooperative synchronization control using a data-driven method",
author="Yu SHI, Yongzhao HUA, Jianglong YU, Xiwang DONG, Zhang REN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="7",
pages="1043-1056",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200001"
}
%0 Journal Article
%T Multi-agent differential game based cooperative synchronization control using a data-driven method
%A Yu SHI
%A Yongzhao HUA
%A Jianglong YU
%A Xiwang DONG
%A Zhang REN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 7
%P 1043-1056
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200001
TY - JOUR
T1 - Multi-agent differential game based cooperative synchronization control using a data-driven method
A1 - Yu SHI
A1 - Yongzhao HUA
A1 - Jianglong YU
A1 - Xiwang DONG
A1 - Zhang REN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 7
SP - 1043
EP - 1056
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200001
Abstract: This paper studies the multi-agent differential game based problem and its application to cooperative synchronization control. A systematized formulation and analysis method for the multi-agent differential game is proposed and a data-driven methodology based on the reinforcement learning (RL) technique is given. First, it is pointed out that typical distributed controllers may not necessarily lead to global Nash equilibrium of the differential game in general cases because of the coupling of networked interactions. Second, to this end, an alternative local Nash solution is derived by defining the best response concept, while the problem is decomposed into local differential games. An off-policy RL algorithm using neighboring interactive data is constructed to update the controller without requiring a system model, while the stability and robustness properties are proved. Third, to further tackle the dilemma, another differential game configuration is investigated based on modified coupling index functions. The distributed solution can achieve global Nash equilibrium in contrast to the previous case while guaranteeing the stability. An equivalent parallel RL method is constructed corresponding to this Nash solution. Finally, the effectiveness of the learning process and the stability of synchronization control are illustrated in simulation results.
[1]Abouheaf MI, Lewis FL, Vamvoudakis KG, et al., 2014. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 50(12):3038-3053.
[2]Başar T, Olsder GJ, 1982. Dynamic Noncooperative Game Theory. Academic Press, New York, USA.
[3]Dong XW, Xi JX, Lu G, et al., 2014. Formation control for high-order linear time-invariant multiagent systems with time delays. IEEE Trans Contr Netw Syst, 1(3):232-240.
[4]Lewis FL, Vrabie DL, Syrmos VL, 2012. Optimal Control. John Wiley & Sons, Hoboken, NJ, USA.
[5]Li JN, Modares H, Chai TY, et al., 2017. Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans Neur Netw Learn Syst, 28(10):2434-2445.
[6]Liu MS, Wan Y, Lopez VG, et al., 2021. Differential graphical game with distributed global Nash solution. IEEE Trans Contr Netw Syst, 8(3):1371-1382.
[7]Lopez VG, Lewis FL, Wan Y, et al., 2020. Stability and robustness analysis of minmax solutions for differential graphical games. Automatica, 121:109177.
[8]Modares H, Lewis FL, 2014. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Contr, 59(11):3051-3056.
[9]Modares H, Lewis FL, Jiang ZP, 2015. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neur Netw Learn Syst, 26(10):2550-2562.
[10]Mu CX, Zhen N, Sun CY, et al., 2017. Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans Cybern, 47(6):1460-1470.
[11]Olfati-Saber R, Murray RM, 2004. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Contr, 49(9):1520-1533.
[12]Peng QY, Low SH, 2018. Distributed optimal power flow algorithm for radial networks, I: balanced single phase case. IEEE Trans Smart Grid, 9(1):111-121.
[13]Qian YY, Liu MS, Wan Y, et al., 2021. Distributed adaptive Nash equilibrium solution for differential graphical games. IEEE Trans Cybern, early access.
[14]Qin JH, Gao HJ, Zheng WX, 2011. Second-order consensus for multi-agent systems with switching topology and communication delay. Syst Contr Lett, 60(6):390-397.
[15]Ren W, Beard RW, 2005. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Autom Contr, 50(5):655-661.
[16]Sun C, Ye MJ, Hu GQ, 2017. Distributed time-varying quadratic optimization for multiple agents under undirected graphs. IEEE Trans Autom Contr, 62(7):3687-3694.
[17]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA.
[18]Tamimi A, Lewis FL, Abu-Khalaf M, 2008. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern B Cybern, 38(4):943-949.
[19]Vamvoudakis KG, Lewis FL, 2011. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 47(8):1556-1569.
[20]Vamvoudakis KG, Lewis FL, Hudas GR, 2012. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 48(8):1598-1611.
[21]Wang MY, Wang ZJ, Talbot J, et al., 2021. Game-theoretic planning for self-driving cars in multivehicle competitive scenarios. IEEE Trans Robot, 37(4):1313-1325.
[22]Wang W, Chen X, Fu H, et al., 2020. Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems. IEEE Trans Syst Man Cybern Syst, 50(11):4123-4134.
[23]Wen GH, Yu XH, Liu ZW, 2021. Recent progress on the study of distributed economic dispatch in smart grid: an overview. Front Inform Technol Electron Eng, 22(1):25-39.
[24]Yang T, Yi XL, Wu JF, et al., 2019. A survey of distributed optimization. Ann Rev Contr, 47:278-305.
[25]Yang YJ, Wan Y, Zhu JH, et al., 2021. H∞ tracking control for linear discrete-time systems: model-free Q-learning designs. IEEE Contr Syst Lett, 5(1):175-180.
[26]Ye MJ, Hu GQ, Lewis FL, 2018. Nash equilibrium seeking for N-coalition noncooperative games. Automatica, 95:266-272.
[27]Ye MJ, Hu GQ, Lewis FL, et al., 2019. A unified strategy for solution seeking in graphical N-coalition noncooperative games. IEEE Trans Autom Contr, 64(11):4645-4652.
[28]Zhang HG, Jiang H, Luo YH, et al., 2017. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 64(5):4091-4100.
[29]Zhao DB, Xia ZP, Wang D, 2015. Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng, 12(4):1461-1468.
[30]Zhao JG, 2020. Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing, 412:167-176.
[31]Zheng WY, Wu WC, Zhang BM, et al., 2016. A fully distributed reactive power optimization and control method for active distribution networks. IEEE Trans Smart Grid, 7(2):1021-1033.
[32]Zhu QY, Başar T, 2015. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Contr Syst, 35(1):46-65.
Open peer comments: Debate/Discuss/Question/Opinion
<1>