JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

Author(s): Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI
Affiliation(s): School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China; more
Corresponding email(s): zj140@mail.ustc.edu.cn, zhwg@ustc.edu.cn, lihq@ustc.edu.cn
Key Words: Multi-agent system; Reinforcement learning; Unexpected crashed agents

Share this article to： More <<< Previous Paper \|Next Paper >>>

Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100594

@article{title="Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents",
author="Jian ZHAO, Youpeng ZHAO, Weixun WANG, Mingyu YANG, Xunhan HU, Wengang ZHOU, Jianye HAO, Houqiang LI",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2100594"
}

%0 Journal Article
%T Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents
%A Jian ZHAO
%A Youpeng ZHAO
%A Weixun WANG
%A Mingyu YANG
%A Xunhan HU
%A Wengang ZHOU
%A Jianye HAO
%A Houqiang LI
%J Frontiers of Information Technology & Electronic Engineering
%P 1032-1042
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2100594"

TY - JOUR
T1 - Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents
A1 - Jian ZHAO
A1 - Youpeng ZHAO
A1 - Weixun WANG
A1 - Mingyu YANG
A1 - Xunhan HU
A1 - Wengang ZHOU
A1 - Jianye HAO
A1 - Houqiang LI
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1032
EP - 1042
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2100594"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Multi-agent reinforcement learning is difficult to apply in practice, partially because of the gap between simulated and real-world scenarios. One reason for the gap is that simulated systems always assume that agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes destroy the cooperation among agents and lead to performance degradation. In this work, we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training. We have designed three coaching strategies (fixed crash rate, curriculum learning, and adaptive crash rate) and a re-sampling strategy for our coach agent. To our knowledge, this work is the first to study unexpected crashes in a multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

针对意外崩溃智能体的教练辅助多智能体强化学习框架

赵鉴¹，赵有朋¹，王维埙²，阳明宇¹，胡迅晗¹，周文罡¹，郝建业²，李厚强¹
¹中国科学技术大学信息科学技术学院，中国合肥市，230026
²天津大学智能与计算学部，中国天津市，300072
摘要：多智能体强化学习在实际场景中很难应用，一部分原因在于模拟环境和现实环境之间存在差距。造成这种差距的一个原因是，模拟系统总是假设智能体可以一直正常工作，而实际上，由于不可避免的硬件或软件故障，一个或多个智能体可能会在合作过程中意外“崩溃”。这样的崩溃会破坏智能体之间的合作，导致系统性能下降。本文中，我们给出了意外崩溃情况下合作多智能体强化学习系统的正式定义。为增强系统应对崩溃时的鲁棒性，提出教练辅助多智能体强化学习框架，其在训练过程中引入一个虚拟教练智能体，以调整系统的崩溃概率。为教练智能体设计了3种教练策略和重采样策略。据我们所知，这是研究多智能体系统中意外崩溃情况的首项工作。在网格环境和星际争霸微管理任务上的大量实验表明，相比固定崩溃概率和课程学习的教练策略，自适应策略更加有效。消融实验进一步展现了重采样策略的有效性。

关键词组：多智能体系统；强化学习；意外崩溃智能体

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Arndt K, Hazara M, Ghadirzadeh A, et al., 2020. Meta reinforcement learning for sim-to-real domain adaptation. Proc IEEE Int Conf on Robotics and Automation, p.2725-2731.

[2]Busoniu L, Babuska R, de Schutter B, 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C Appl Rev, 38(2):156-172.

[3]Dosovitskiy A, Ros G, Codevilla F, et al., 2017. Carla: an open urban driving simulator. Proc 1^st Conf on Robot Learning, p.1-16.

[4]Foerster J, Nardelli N, Farquhar G, et al., 2017. Stabilising experience replay for deep multi-agent reinforcement learning. Proc 34^th Int Conf on Machine Learning, p.1146-1155.

[5]Furrer F, Burri M, Achtelik M, et al., 2016. RotorS—a modular Gazebo MAV simulator framework. In: Koubaa A(Ed.), Robot Operating System (ROS): the Complete Reference. Volume 1, Springer, Cham, p.595-625.

[6]Guestrin C, Koller D, Parr R, 2001. Multiagent planning with factored MDPs. Proc 14^th Int Conf on Neural Information Processing Systems: Natural and Synthetic, p.1523-1530.

[7]Higgins I, Pal A, Rusu A, et al., 2017. DARLA: improving zero-shot transfer in reinforcement learning. Proc 34^th Int Conf on Machine Learning, p.1480-1490.

[8]Kim D, Moon S, Hostallero D, et al., 2019. Learning to schedule communication in multi-agent reinforcement learning. Proc 7^th Int Conf on Learning Representations, p.1-17.

[9]Kok JR, Vlassis N, 2006. Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res, 7:1789-1828.

[10]Kraemer L, Banerjee B, 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82-94.

[11]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31^st Int Conf on Neural Information Processing Systems, p.6382-6393.

[12]McCord C, Queralta JP, Gia TN, et al., 2019. Distributed progressive formation control for multi-agent systems: 2D and 3D deployment of UAVs in ROS/Gazebo with rotors. Proc European Conf on Mobile Robots, p.1-6.

[13]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[14]Narvekar S, Peng B, Leonetti M, et al., 2020. Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res, 21(181):1-50.

[15]Oliehoek FA, Spaan MTJ, Vlassis N, 2008. Optimal and approximate Q-value functions for decentralized POMDPs. J Artif Intell Res, 32:289-353.

[16]Omidshafiei S, Pazis J, Amato C, et al., 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. Proc 34^th Int Conf on Machine Learning, p.2681-2690.

[17]Peng P, Wen Y, Yang YD, et al., 2017. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. Proc 34^th Int Conf on Machine Learning, p.2681-2690.

[18]Rashid T, Samvelyan M, de Witt SC, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35^th Int Conf on Machine Learning, p.4292-4301.

[19]Samvelyan M, Rashid T, de Witt CS, et al., 2019. The StarCraft Multi-agent Challenge. Proc 18^th Int Conf on Autonomous Agents and Multiagent Systems, p.2186-2188.

[20]Shah S, Dey D, Lovett C, et al., 2018. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. 11^th Int Conf on Field and Service Robotics, p.621-635.

[21]Son K, Kim D, Kang WJ, et al., 2019. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proc 36^th Int Conf on Machine Learning, p.5887-5896.

[22]Sukhbaatar S, Szlam A, Fergus R, 2016. Learning multi-agent communication with backpropagation. Proc 30^th Int Conf on Neural Information Processing Systems, p.2252-2260.

[23]Sunehag P, Lever G, Gruslys A, et al., 2018. Value-decomposition networks for cooperative multi-agent learning based on team reward. Proc 17^th Int Conf on Autonomous Agents and Multiagent Systems, p.2085-2087.

[24]Tan M, 1993. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann, p.330-337.

[25]Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23-30.

[26]Todorov E, Erez T, Tassa Y, 2012. MuJoCo: a physics engine for model-based control. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.5026-5033.

[27]Traoré R, Caselles-Dupré H, Lesort T, et al., 2019. Continual reinforcement learning deployed in real-life using policy distillation and Sim2Real transfer. https://arxiv.org/abs/1906.04452

[28]Tuyls K, Weiss G, 2012. Multiagent learning: basics, challenges, and prospects. AI Mag, 33(3):41.

[29]Wang JH, Ren ZZ, Liu T, et al., 2020. QPLEX: duplex dueling multi-agent Q-learning. Proc 9^th Int Conf on Learning Representations, p.1-16.

[30]Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047-1058.

[31]Wang YP, Zheng KX, Tian DX, et al., 2021. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving. Front Inform Technol Electron Eng, 22(5):673-686.

[32]Yang YD, Hao JY, Liao B, et al., 2020. QATTEN: a general framework for cooperative multiagent reinforcement learning. https://arxiv.org/abs/2002.03939

[33]Zhang KQ, Yang ZR, Basar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22(6):802-814.

[34]Zhao WS, Queralta JP, Westerlund T, 2020. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. Proc IEEE Symp Series on Computational Intelligence, p.737-744.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

针对意外崩溃智能体的教练辅助多智能体强化学习框架

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference