CLC number: TP18
On-line Access: 2022-07-21
Received: 2021-12-31
Revision Accepted: 2022-07-21
Crosschecked: 2022-04-08
Cited: 0
Clicked: 2319
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0003-4895-990X
Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(7): 1032-1042.
@article{title="Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents",
author="Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="7",
pages="1032-1042",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100594"
}
%0 Journal Article
%T Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents
%A Jian ZHAO
%A Youpeng ZHAO
%A Weixun WANG
%A Mingyu YANG
%A Xunhan HU
%A Wengang ZHOU
%A Jianye HAO
%A Houqiang LI
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 7
%P 1032-1042
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100594
TY - JOUR
T1 - Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents
A1 - Jian ZHAO
A1 - Youpeng ZHAO
A1 - Weixun WANG
A1 - Mingyu YANG
A1 - Xunhan HU
A1 - Wengang ZHOU
A1 - Jianye HAO
A1 - Houqiang LI
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 7
SP - 1032
EP - 1042
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100594
Abstract: Multi-agent reinforcement learning is difficult to apply in practice, partially because of the gap between simulated and real-world scenarios. One reason for the gap is that simulated systems always assume that agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes destroy the cooperation among agents and lead to performance degradation. In this work, we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training. We have designed three coaching strategies (fixed crash rate, curriculum learning, and adaptive crash rate) and a re-sampling strategy for our coach agent. To our knowledge, this work is the first to study unexpected crashes in a multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.
[1]Arndt K, Hazara M, Ghadirzadeh A, et al., 2020. Meta reinforcement learning for sim-to-real domain adaptation. Proc IEEE Int Conf on Robotics and Automation, p.2725-2731.
[2]Busoniu L, Babuska R, de Schutter B, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C Appl Rev, 38(2):156-172.
[3]Dosovitskiy A, Ros G, Codevilla F, et al., 2017. Carla: an open urban driving simulator. Proc 1st Conf on Robot Learning, p.1-16.
[4]Foerster J, Nardelli N, Farquhar G, et al., 2017. Stabilising experience replay for deep multi-agent reinforcement learning. Proc 34th Int Conf on Machine Learning, p.1146-1155.
[5]Furrer F, Burri M, Achtelik M, et al., 2016. RotorS—a modular Gazebo MAV simulator framework. In: Koubaa A(Ed.), Robot Operating System (ROS): the Complete Reference. Volume 1, Springer, Cham, p.595-625.
[6]Guestrin C, Koller D, Parr R, 2001. Multiagent planning with factored MDPs. Proc 14th Int Conf on Neural Information Processing Systems: Natural and Synthetic, p.1523-1530.
[7]Higgins I, Pal A, Rusu A, et al., 2017. DARLA: improving zero-shot transfer in reinforcement learning. Proc 34th Int Conf on Machine Learning, p.1480-1490.
[8]Kim D, Moon S, Hostallero D, et al., 2019. Learning to schedule communication in multi-agent reinforcement learning. Proc 7th Int Conf on Learning Representations, p.1-17.
[9]Kok JR, Vlassis N, 2006. Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res, 7:1789-1828.
[10]Kraemer L, Banerjee B, 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82-94.
[11]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31st Int Conf on Neural Information Processing Systems, p.6382-6393.
[12]McCord C, Queralta JP, Gia TN, et al., 2019. Distributed progressive formation control for multi-agent systems: 2D and 3D deployment of UAVs in ROS/Gazebo with rotors. Proc European Conf on Mobile Robots, p.1-6.
[13]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[14]Narvekar S, Peng B, Leonetti M, et al., 2020. Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res, 21(181):1-50.
[15]Oliehoek FA, Spaan MTJ, Vlassis N, 2008. Optimal and approximate Q-value functions for decentralized POMDPs. J Artif Intell Res, 32:289-353.
[16]Omidshafiei S, Pazis J, Amato C, et al., 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. Proc 34th Int Conf on Machine Learning, p.2681-2690.
[17]Peng P, Wen Y, Yang YD, et al., 2017. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. Proc 34th Int Conf on Machine Learning, p.2681-2690.
[18]Rashid T, Samvelyan M, de Witt SC, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35th Int Conf on Machine Learning, p.4292-4301.
[19]Samvelyan M, Rashid T, de Witt CS, et al., 2019. The StarCraft Multi-agent Challenge. Proc 18th Int Conf on Autonomous Agents and Multiagent Systems, p.2186-2188.
[20]Shah S, Dey D, Lovett C, et al., 2018. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. 11th Int Conf on Field and Service Robotics, p.621-635.
[21]Son K, Kim D, Kang WJ, et al., 2019. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proc 36th Int Conf on Machine Learning, p.5887-5896.
[22]Sukhbaatar S, Szlam A, Fergus R, 2016. Learning multi-agent communication with backpropagation. Proc 30th Int Conf on Neural Information Processing Systems, p.2252-2260.
[23]Sunehag P, Lever G, Gruslys A, et al., 2018. Value-decomposition networks for cooperative multi-agent learning based on team reward. Proc 17th Int Conf on Autonomous Agents and Multiagent Systems, p.2085-2087.
[24]Tan M, 1993. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann, p.330-337.
[25]Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23-30.
[26]Todorov E, Erez T, Tassa Y, 2012. MuJoCo: a physics engine for model-based control. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.5026-5033.
[27]Traoré R, Caselles-Dupré H, Lesort T, et al., 2019. Continual reinforcement learning deployed in real-life using policy distillation and Sim2Real transfer. https://arxiv.org/abs/1906.04452
[28]Tuyls K, Weiss G, 2012. Multiagent learning: basics, challenges, and prospects. AI Mag, 33(3):41.
[29]Wang JH, Ren ZZ, Liu T, et al., 2020. QPLEX: duplex dueling multi-agent Q-learning. Proc 9th Int Conf on Learning Representations, p.1-16.
[30]Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047-1058.
[31]Wang YP, Zheng KX, Tian DX, et al., 2021. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving. Front Inform Technol Electron Eng, 22(5):673-686.
[32]Yang YD, Hao JY, Liao B, et al., 2020. QATTEN: a general framework for cooperative multiagent reinforcement learning. https://arxiv.org/abs/2002.03939
[33]Zhang KQ, Yang ZR, Basar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22(6):802-814.
[34]Zhao WS, Queralta JP, Westerlund T, 2020. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. Proc IEEE Symp Series on Computational Intelligence, p.737-744.
Open peer comments: Debate/Discuss/Question/Opinion
<1>