CLC number: TP181
On-line Access: 2023-01-21
Received: 2022-02-25
Revision Accepted: 2023-01-21
Crosschecked: 2022-08-11
Cited: 0
Clicked: 2293
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0003-3330-4978
Yixiang REN, Zhenhui YE, Yining CHEN, Xiaohong JIANG, Guanghua SONG. Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(1): 117-130.
@article{title="Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments",
author="Yixiang REN, Zhenhui YE, Yining CHEN, Xiaohong JIANG, Guanghua SONG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="1",
pages="117-130",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200073"
}
%0 Journal Article
%T Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments
%A Yixiang REN
%A Zhenhui YE
%A Yining CHEN
%A Xiaohong JIANG
%A Guanghua SONG
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 1
%P 117-130
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200073
TY - JOUR
T1 - Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments
A1 - Yixiang REN
A1 - Zhenhui YE
A1 - Yining CHEN
A1 - Xiaohong JIANG
A1 - Guanghua SONG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 1
SP - 117
EP - 130
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200073
Abstract: The recent progress in multi-agent deep reinforcement learning (MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment. Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment, where everyone has the functions of communicating with neighbors and remembering his/her own experience, we propose a novel network structure called the hierarchical graph recurrent network (HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents, and adopt a recurrent unit to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy. Based on the above technologies, we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines, but also demonstrate the interpretability, scalability, and transferability of the proposed model.
[1]Adler JL, Satapathy G, Manikonda V, et al., 2005. A multi-agent approach to cooperative traffic management and route guidance. Trans Res Part B, 39(4):297-318.
[2]Cho K, van Merriënboer B, Gulcehre C, et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.1724-1734.
[3]Chu TS, Wang J, Codecà L, et al., 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst, 21(3):1086-1095.
[4]Claus C, Boutilier C, 1998. The dynamics of reinforcement learning in cooperative multiagent systems. Proc 15th National Conf on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conf, p.746-752.
[5]Foerster JN, Assael YM, de Freitas N, et al., 2016. Learning to communicate with deep multi-agent reinforcement learning. Proc 30th Int Conf on Neural Information Processing Systems, p.2145-2153.
[6]Haarnoja T, Tang HR, Abbeel P, et al., 2017. Reinforcement learning with deep energy-based policies. Proc 34th Int Conf on Machine Learning, p.1352-1361.
[7]Haarnoja T, Zhou A, Abbeel P, et al., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc 35th Int Conf on Machine Learning, p.1861-1870.
[8]Hausknecht M, Stone P, 2015. Deep recurrent Q-learning for partially observable MDPs. Proc AAAI Fall Symp Series, p.29-37.
[9]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.
[10]Iqbal S, Sha F, 2019. Actor-attention-critic for multi-agent reinforcement learning. Proc 36th Int Conf on Machine Learning, p.2961-2970.
[11]Jiang JC, Dun C, Huang TJ, et al., 2020. Graph convolutional reinforcement learning. Proc 8th Int Conf on Learning Representations.
[12]Kingma DP, Ba J, 2015. Adam: a method for stochastic optimization. Proc 3rd Int Conf on Learning Representations.
[13]Lillicrap TP, Hunt JJ, Pritzel A, et al., 2015. Continuous control with deep reinforcement learning. Proc 4th Int Conf on Learning Representations.
[14]Lin LJ, 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn, 8(3-4):293-321.
[15]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31st Int Conf on Neural Information Processing Systems, p.6382-6393.
[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[17]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.
[18]Rashid T, Samvelyan M, Schroeder C, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35th Int Conf on Machine Learning, p.4295-4304.
[19]Rui P, 2010. Multi-UAV formation maneuvering control based on Q-learning fuzzy controller. Proc 2nd Int Conf on Advanced Computer Control, p.252-257.
[20]Ryu H, Shin H, Park J, 2020. Multi-agent actor-critic with hierarchical graph attention network. Proc AAAI Conf Artif Intell, p.7236-7243.
[21]Sukhbaatar S, Szlam A, Fergus R, 2016. Learning multi- agent communication with backpropagation. Proc 30th Int Conf on Neural Information Processing Systems, p.2252-2260.
[22]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010.
[23]Veličković P, Cucurull G, Casanova A, et al., 2018. Graph attention networks. Proc 6th Int Conf on Learning Representations.
[24]Wang RE, Everett M, How JP, 2020. R-MADDPG for partially observable environments and limited communication.
[25]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.
[26]Ye DY, Zhang MJ, Yang Y, 2015. A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5):10026-10047.
[27]Ye ZH, Chen YN, Jiang XH, et al., 2022a. Improving sample efficiency in multi-agent actor-critic methods. Appl Intell, 52(4):3691-3704.
[28]Ye ZH, Wang K, Chen YN, et al., 2022b. Multi-UAV navigation for partially observable communication coverage by graph reinforcement learning. IEEE Trans Mobile Comput, early access.
[29]Zhang KQ, Yang ZR, Liu H, et al., 2018. Fully decentralized multi-agent reinforcement learning with networked agents. Proc 35th Int Conf on Machine Learning, p.5872-5881.
[30]Zhang Y, Mou ZY, Gao FF, et al., 2020. UAV-enabled secure communications by multi-agent deep reinforcement learning. IEEE Trans Veh Technol, 69(10):11599-11611.
[31]Zheng LM, Yang JC, Cai H, et al., 2018. MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. Proc 32nd AAAI Conf on Artificial Intelligence, p.8222-8223.
Open peer comments: Debate/Discuss/Question/Opinion
<1>