
CLC number: TP273
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-04-21
Cited: 0
Clicked: 4604
Citations: Bibtex RefMan EndNote GB/T7714
Yu SHI, Yongzhao HUA, Jianglong YU, Xiwang DONG, Zhang REN. Multi-agent differential game based cooperative synchronization control using a data-driven method[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200001 @article{title="Multi-agent differential game based cooperative synchronization control using a data-driven method", %0 Journal Article TY - JOUR
基于多智能体微分博弈的数据驱动协同一致控制1北京航空航天大学自动化科学与电气工程学院,中国北京市,100191 2北京航空航天大学人工智能研究院,中国北京市,100191 摘要:本文研究了多智能体微分博弈问题及其在协同一致控制中的应用。提出系统化的多智能体微分博弈构建和分析方法,同时给出一种基于强化学习技术的数据驱动方法。首先论证了由于网络交互的耦合特性,典型的分布式控制器无法充分保证微分博弈的全局纳什均衡。其次通过定义最优对策的概念,将问题分解为局部微分博弈问题,并给出局部纳什均衡解。构造了一种无需系统模型信息的离轨策略强化学习算法,利用在线邻居交互数据对控制器进行优化更新,并证明控制器的稳定性和鲁棒性。进一步提出一种基于改进耦合指标函数的微分博弈模型及其等效的强化学习求解方法。与现有研究相比,该模型解决了多智能体所需信息的耦合问题,并实现分布式框架下全局纳什均衡和稳定控制。构造了与此纳什解对应的等价并行强化学习方法。最后,仿真结果验证了学习过程的有效性和一致控制的稳定性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Abouheaf MI, Lewis FL, Vamvoudakis KG, et al., 2014. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 50(12):3038-3053. ![]() [2]Başar T, Olsder GJ, 1982. Dynamic Noncooperative Game Theory. Academic Press, New York, USA. ![]() [3]Dong XW, Xi JX, Lu G, et al., 2014. Formation control for high-order linear time-invariant multiagent systems with time delays. IEEE Trans Contr Netw Syst, 1(3):232-240. ![]() [4]Lewis FL, Vrabie DL, Syrmos VL, 2012. Optimal Control. John Wiley & Sons, Hoboken, NJ, USA. ![]() [5]Li JN, Modares H, Chai TY, et al., 2017. Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans Neur Netw Learn Syst, 28(10):2434-2445. ![]() [6]Liu MS, Wan Y, Lopez VG, et al., 2021. Differential graphical game with distributed global Nash solution. IEEE Trans Contr Netw Syst, 8(3):1371-1382. ![]() [7]Lopez VG, Lewis FL, Wan Y, et al., 2020. Stability and robustness analysis of minmax solutions for differential graphical games. Automatica, 121:109177. ![]() [8]Modares H, Lewis FL, 2014. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Contr, 59(11):3051-3056. ![]() [9]Modares H, Lewis FL, Jiang ZP, 2015. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neur Netw Learn Syst, 26(10):2550-2562. ![]() [10]Mu CX, Zhen N, Sun CY, et al., 2017. Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans Cybern, 47(6):1460-1470. ![]() [11]Olfati-Saber R, Murray RM, 2004. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Contr, 49(9):1520-1533. ![]() [12]Peng QY, Low SH, 2018. Distributed optimal power flow algorithm for radial networks, I: balanced single phase case. IEEE Trans Smart Grid, 9(1):111-121. ![]() [13]Qian YY, Liu MS, Wan Y, et al., 2021. Distributed adaptive Nash equilibrium solution for differential graphical games. IEEE Trans Cybern, early access. ![]() [14]Qin JH, Gao HJ, Zheng WX, 2011. Second-order consensus for multi-agent systems with switching topology and communication delay. Syst Contr Lett, 60(6):390-397. ![]() [15]Ren W, Beard RW, 2005. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Autom Contr, 50(5):655-661. ![]() [16]Sun C, Ye MJ, Hu GQ, 2017. Distributed time-varying quadratic optimization for multiple agents under undirected graphs. IEEE Trans Autom Contr, 62(7):3687-3694. ![]() [17]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA. ![]() [18]Tamimi A, Lewis FL, Abu-Khalaf M, 2008. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern B Cybern, 38(4):943-949. ![]() [19]Vamvoudakis KG, Lewis FL, 2011. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 47(8):1556-1569. ![]() [20]Vamvoudakis KG, Lewis FL, Hudas GR, 2012. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 48(8):1598-1611. ![]() [21]Wang MY, Wang ZJ, Talbot J, et al., 2021. Game-theoretic planning for self-driving cars in multivehicle competitive scenarios. IEEE Trans Robot, 37(4):1313-1325. ![]() [22]Wang W, Chen X, Fu H, et al., 2020. Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems. IEEE Trans Syst Man Cybern Syst, 50(11):4123-4134. ![]() [23]Wen GH, Yu XH, Liu ZW, 2021. Recent progress on the study of distributed economic dispatch in smart grid: an overview. Front Inform Technol Electron Eng, 22(1):25-39. ![]() [24]Yang T, Yi XL, Wu JF, et al., 2019. A survey of distributed optimization. Ann Rev Contr, 47:278-305. ![]() [25]Yang YJ, Wan Y, Zhu JH, et al., 2021. H∞ tracking control for linear discrete-time systems: model-free Q-learning designs. IEEE Contr Syst Lett, 5(1):175-180. ![]() [26]Ye MJ, Hu GQ, Lewis FL, 2018. Nash equilibrium seeking for N-coalition noncooperative games. Automatica, 95:266-272. ![]() [27]Ye MJ, Hu GQ, Lewis FL, et al., 2019. A unified strategy for solution seeking in graphical N-coalition noncooperative games. IEEE Trans Autom Contr, 64(11):4645-4652. ![]() [28]Zhang HG, Jiang H, Luo YH, et al., 2017. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 64(5):4091-4100. ![]() [29]Zhao DB, Xia ZP, Wang D, 2015. Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng, 12(4):1461-1468. ![]() [30]Zhao JG, 2020. Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing, 412:167-176. ![]() [31]Zheng WY, Wu WC, Zhang BM, et al., 2016. A fully distributed reactive power optimization and control method for active distribution networks. IEEE Trans Smart Grid, 7(2):1021-1033. ![]() [32]Zhu QY, Başar T, 2015. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Contr Syst, 35(1):46-65. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>