
CLC number: TP18
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-04-08
Cited: 0
Clicked: 4400
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0003-4895-990X
Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100594 @article{title="Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents", %0 Journal Article TY - JOUR
针对意外崩溃智能体的教练辅助多智能体强化学习框架1中国科学技术大学信息科学技术学院,中国合肥市,230026 2天津大学智能与计算学部,中国天津市,300072 摘要:多智能体强化学习在实际场景中很难应用,一部分原因在于模拟环境和现实环境之间存在差距。造成这种差距的一个原因是,模拟系统总是假设智能体可以一直正常工作,而实际上,由于不可避免的硬件或软件故障,一个或多个智能体可能会在合作过程中意外“崩溃”。这样的崩溃会破坏智能体之间的合作,导致系统性能下降。本文中,我们给出了意外崩溃情况下合作多智能体强化学习系统的正式定义。为增强系统应对崩溃时的鲁棒性,提出教练辅助多智能体强化学习框架,其在训练过程中引入一个虚拟教练智能体,以调整系统的崩溃概率。为教练智能体设计了3种教练策略和重采样策略。据我们所知,这是研究多智能体系统中意外崩溃情况的首项工作。在网格环境和星际争霸微管理任务上的大量实验表明,相比固定崩溃概率和课程学习的教练策略,自适应策略更加有效。消融实验进一步展现了重采样策略的有效性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Arndt K, Hazara M, Ghadirzadeh A, et al., 2020. Meta reinforcement learning for sim-to-real domain adaptation. Proc IEEE Int Conf on Robotics and Automation, p.2725-2731. ![]() [2]Busoniu L, Babuska R, de Schutter B, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C Appl Rev, 38(2):156-172. ![]() [3]Dosovitskiy A, Ros G, Codevilla F, et al., 2017. Carla: an open urban driving simulator. Proc 1st Conf on Robot Learning, p.1-16. ![]() [4]Foerster J, Nardelli N, Farquhar G, et al., 2017. Stabilising experience replay for deep multi-agent reinforcement learning. Proc 34th Int Conf on Machine Learning, p.1146-1155. ![]() [5]Furrer F, Burri M, Achtelik M, et al., 2016. RotorS—a modular Gazebo MAV simulator framework. In: Koubaa A(Ed.), Robot Operating System (ROS): the Complete Reference. Volume 1, Springer, Cham, p.595-625. ![]() [6]Guestrin C, Koller D, Parr R, 2001. Multiagent planning with factored MDPs. Proc 14th Int Conf on Neural Information Processing Systems: Natural and Synthetic, p.1523-1530. ![]() [7]Higgins I, Pal A, Rusu A, et al., 2017. DARLA: improving zero-shot transfer in reinforcement learning. Proc 34th Int Conf on Machine Learning, p.1480-1490. ![]() [8]Kim D, Moon S, Hostallero D, et al., 2019. Learning to schedule communication in multi-agent reinforcement learning. Proc 7th Int Conf on Learning Representations, p.1-17. ![]() [9]Kok JR, Vlassis N, 2006. Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res, 7:1789-1828. ![]() [10]Kraemer L, Banerjee B, 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82-94. ![]() [11]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31st Int Conf on Neural Information Processing Systems, p.6382-6393. ![]() [12]McCord C, Queralta JP, Gia TN, et al., 2019. Distributed progressive formation control for multi-agent systems: 2D and 3D deployment of UAVs in ROS/Gazebo with rotors. Proc European Conf on Mobile Robots, p.1-6. ![]() [13]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533. ![]() [14]Narvekar S, Peng B, Leonetti M, et al., 2020. Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res, 21(181):1-50. ![]() [15]Oliehoek FA, Spaan MTJ, Vlassis N, 2008. Optimal and approximate Q-value functions for decentralized POMDPs. J Artif Intell Res, 32:289-353. ![]() [16]Omidshafiei S, Pazis J, Amato C, et al., 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. Proc 34th Int Conf on Machine Learning, p.2681-2690. ![]() [17]Peng P, Wen Y, Yang YD, et al., 2017. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. Proc 34th Int Conf on Machine Learning, p.2681-2690. ![]() [18]Rashid T, Samvelyan M, de Witt SC, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35th Int Conf on Machine Learning, p.4292-4301. ![]() [19]Samvelyan M, Rashid T, de Witt CS, et al., 2019. The StarCraft Multi-agent Challenge. Proc 18th Int Conf on Autonomous Agents and Multiagent Systems, p.2186-2188. ![]() [20]Shah S, Dey D, Lovett C, et al., 2018. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. 11th Int Conf on Field and Service Robotics, p.621-635. ![]() [21]Son K, Kim D, Kang WJ, et al., 2019. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proc 36th Int Conf on Machine Learning, p.5887-5896. ![]() [22]Sukhbaatar S, Szlam A, Fergus R, 2016. Learning multi-agent communication with backpropagation. Proc 30th Int Conf on Neural Information Processing Systems, p.2252-2260. ![]() [23]Sunehag P, Lever G, Gruslys A, et al., 2018. Value-decomposition networks for cooperative multi-agent learning based on team reward. Proc 17th Int Conf on Autonomous Agents and Multiagent Systems, p.2085-2087. ![]() [24]Tan M, 1993. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann, p.330-337. ![]() [25]Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23-30. ![]() [26]Todorov E, Erez T, Tassa Y, 2012. MuJoCo: a physics engine for model-based control. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.5026-5033. ![]() [27]Traoré R, Caselles-Dupré H, Lesort T, et al., 2019. Continual reinforcement learning deployed in real-life using policy distillation and Sim2Real transfer. https://arxiv.org/abs/1906.04452 ![]() [28]Tuyls K, Weiss G, 2012. Multiagent learning: basics, challenges, and prospects. AI Mag, 33(3):41. ![]() [29]Wang JH, Ren ZZ, Liu T, et al., 2020. QPLEX: duplex dueling multi-agent Q-learning. Proc 9th Int Conf on Learning Representations, p.1-16. ![]() [30]Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047-1058. ![]() [31]Wang YP, Zheng KX, Tian DX, et al., 2021. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving. Front Inform Technol Electron Eng, 22(5):673-686. ![]() [32]Yang YD, Hao JY, Liao B, et al., 2020. QATTEN: a general framework for cooperative multiagent reinforcement learning. https://arxiv.org/abs/2002.03939 ![]() [33]Zhang KQ, Yang ZR, Basar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22(6):802-814. ![]() [34]Zhao WS, Queralta JP, Westerlund T, 2020. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. Proc IEEE Symp Series on Computational Intelligence, p.737-744. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>