JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Cooperative channel assignment for VANETs based on multiagent reinforcement learning

Author(s): Yun-peng Wang, Kun-xian Zheng, Da-xin Tian, Xu-ting Duan, Jian-shan Zhou
Affiliation(s): Beijing Advanced Innovation Center for Big Data and Brain Computing, School of Transportation Science and Engineering, Beihang University, Beijing 100191, China
Corresponding email(s): ypwang@buaa.edu.cn, zhengkunxian@buaa.edu.cn, dtian@buaa.edu.cn, duanxuting@buaa.edu.cn
Key Words: Vehicular ad-hoc networks, Reinforcement learning, Dynamic channel assignment, Multichannel

Share this article to： More <<< Previous Paper \|Next Paper >>>

Yun-peng Wang, Kun-xian Zheng, Da-xin Tian, Xu-ting Duan, Jian-shan Zhou. Cooperative channel assignment for VANETs based on multiagent reinforcement learning[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900308

@article{title="Cooperative channel assignment for VANETs based on multiagent reinforcement learning",
author="Yun-peng Wang, Kun-xian Zheng, Da-xin Tian, Xu-ting Duan, Jian-shan Zhou",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1900308"
}

%0 Journal Article
%T Cooperative channel assignment for VANETs based on multiagent reinforcement learning
%A Yun-peng Wang
%A Kun-xian Zheng
%A Da-xin Tian
%A Xu-ting Duan
%A Jian-shan Zhou
%J Frontiers of Information Technology & Electronic Engineering
%P 1047-1058
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1900308"

TY - JOUR
T1 - Cooperative channel assignment for VANETs based on multiagent reinforcement learning
A1 - Yun-peng Wang
A1 - Kun-xian Zheng
A1 - Da-xin Tian
A1 - Xu-ting Duan
A1 - Jian-shan Zhou
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1047
EP - 1058
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1900308"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: dynamic channel assignment (DCA) plays a key role in extending vehicular ad-hoc network capacity and mitigating congestion. However, channel assignment under vehicular direct communication scenarios faces mutual influence of large-scale nodes, the lack of centralized coordination, unknown global state information, and other challenges. To solve this problem, a multiagent reinforcement learning (RL) based cooperative DCA (RL-CDCA) mechanism is proposed. Specifically, each vehicular node can successfully learn the proper strategies of channel selection and backoff adaptation from the real-time channel state information (CSI) using two cooperative RL models. In addition, neural networks are constructed as nonlinear Q-function approximators, which facilitates the mapping of the continuously sensed input to the mixed policy output. Nodes are driven to locally share and incorporate their individual rewards such that they can optimize their policies in a distributed collaborative manner. Simulation results show that the proposed multiagent RL-CDCA can better reduce the one-hop packet delay by no less than 73.73%, improve the packet delivery ratio by no less than 12.66% on average in a highly dense situation, and improve the fairness of the global network resource allocation.

基于多智能体强化学习的车载自组织网络协作信道分配

王云鹏，郑坤贤，田大新，段续庭，周建山
北京航空航天大学交通科学与工程学院，大数据科学与脑机智能高精尖创新中心，中国北京市，100191

摘要：动态信道分配（DCA）在扩展车载自组织网络容量和缓解其拥塞方面起着关键作用。然而，在车-车直连通信场景下，信道分配面临大规模节点相互影响、缺乏集中式协调、全局网络状态信息未知以及其他挑战。为解决该问题，提出一种基于多智能体强化学习（RL）的协作动态信道分配（RL-CDCA）机制。具体而言，每个车辆节点都可借助2个互相协作的RL模型，从实时信道状态信息中成功学习信道选择和信道接入自适应退避的正确策略。此外，将神经网络构造为非线性Q函数逼近器，有助于将感测到的连续输入值映射到混合策略输出。多智能体RL-CDCA驱动节点共享本地奖励并合并区域内其他节点各自的奖励，以便它们能够以分布式协作方式优化各自策略。仿真结果表明，与4种现有机制相比，所提多智能体RL-CDCA算法即便在路网车辆高度密集的情况下仍能将单跳数据包传输延迟减少不小于73.73％，将平均数据包递送成功率提高不小于12.66％，并更好地保证网络资源分配公平性。

关键词组：车载自组织网络；强化学习；动态信道分配；多信道

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Ahmed SAM, Ariffin SHS, Fisal N, 2013. Overview of wireless access in vehicular environment (wave) protocols and standards. Ind J Sci Technol, 7(6):4994-5001.

[2]Ahmed T, Le Moullec Y, 2017. A QoS optimization approach in cognitive body area networks for healthcare applications. Sensors, 17(4):780.

[3]Ahmed T, Ahmed F, Le Moullec Y, 2017. Optimization of channel allocation in wireless body area networks by means of reinforcement learning. IEEE Asia Pacific Conf on Wireless and Mobile, p.120-123.

[4]Almohammedi AA, Noordin NK, Sali A, et al., 2017. An adaptive multi-channel assignment and coordination scheme for IEEE 802.11p/1609.4 in vehicular ad-hoc networks. IEEE Access, 6:2781-2802.

[5]Arulkumaran K, Deisenroth MP, Brundage M, et al., 2017. A brief survey of deep reinforcement learning. IEEE Signal Process Mag, 34(6):26-38.

[6]Atallah R, Assi C, Khabbaz M, 2017. Deep reinforcement learning-based scheduling for roadside communication networks. 15^th Int Symp on Modeling and Optimization in Mobile, p.1-8.

[7]Audhya GK, Sinha K, Ghosh SC, et al., 2011. A survey on the channel assignment problem in wireless networks. Wirel Commun Mob Comput, 11(5):583-609.

[8]Barto AG, Sutton RS, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA.

[9]Cheeneebash J, Lozano JA, Rughooputh HCS, 2012. A survey on the algorithms used to solve the channel assignment problem. Rec Pat Telecommun, 1(1):54-71.

[10]He Y, Zhao N, Yin HX, 2017. Integrated networking, caching, and computing for connected vehicles: a deep reinforcement learning approach. IEEE Trans Veh Technol, 67(1):44-55.

[11]Jain RK, Chiu DMW, Hawe WR, 1998. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems. CoRR. cs. NI/9809099, DEC, Hudson, Canada.

[12]Kaelbling LP, Littman ML, Moore AW, 1996. Reinforcement learning: a survey. J Artif Intell Res, 4(1):237-285.

[13]Li L, Lv YS, Wang FY, 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin, 3(3):247-254.

[14]Li XH, Hu BJ, Chen HB, et al., 2015. An RSU-coordinated synchronous multi-channel MAC scheme for vehicular ad hoc networks. IEEE Access, 3:2794-2802.

[15]Liu N, Li Z, Xu JL, et al., 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. IEEE 37^th Int Conf on Distributed Computing Systems, p.372-382.

[16]Liu SJ, Hu X, Wang WD, 2018. Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems. IEEE Access, 6:15733-15742.

[17]Louta M, Sarigiannidis P, Misra S, et al., 2014. RLAM: a dynamic and efficient reinforcement learning-based adaptive mapping scheme in mobile WiMAX networks. Mob Inform Syst, 10(2):173-196.

[18]Maddison CJ, Huang A, Sutskever I, et al., 2014. Move evaluation in go using deep convolutional neural networks. https://arxiv.org/abs/1412.6564

[19]Mao HZ, Alizadeh M, Menache I, et al., 2016. Resource management with deep reinforcement learning. Proc 15^th ACM Workshop on Hot Topics in Networks, p.50-56.

[20]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602

[21]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[22]Nie JH, Haykin S, 1999. A dynamic channel assignment policy through Q-learning. IEEE Trans Neur Netw, 10(6):linebreak 1443-1455.

[23]Ouyous M, Zytoune O, Aboutajdine D, 2017. Multi-channel coordination based MAC protocols in vehicular ad hoc networks (VANETs): a survey. In: El-Azouzi R, Menasche D, Sabir E, et al. (Eds.), Advances in Ubiquitous Networking 2. Springer, Singapore.

[24]Qiu CR, Hu Y, Chen Y, et al., 2019. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Int Things J, 6(5):8577-8588.

[25]Seah MWM, Tham CK, Srinivasan V, et al., 2007. Achieving coverage through distributed reinforcement learning in wireless sensor networks. 3^rd Int Conf on Intelligent Sensors, Sensor Networks and Information, p.425-430.

[26]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of go without human knowledge. Nature, 550(7676):354-350.

[27]Wang Q, Leng S, Fu HR, et al., 2012. An IEEE 802.11p-based multichannel MAC scheme with channel coordination for vehicular ad hoc networks. IEEE Trans Intell Trans Syst, 13(2):449-458.

[28]Wang W, Kwasinski A, Niyato D, et al., 2017. A survey on applications of model-free strategy learning in cognitive wireless networks. IEEE Commun Surv Tutor, 18(3):1717-1757.

[29]Xu ZY, Wang YZ, Tang J, et al., 2017. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs. IEEE Int Conf on Communications, p.1-6.

[30]Yau KLA, Komisarczuk P, Paul DT, 2010. Enhancing network performance in distributed cognitive radio networks using single-agent and multi-agent reinforcement learning. IEEE Local Computer Network Conf, p.152-159.

[31]Ye H, Li GY, and Juang BHF, 2018. Deep reinforcement learning based resource allocation for V2V communications. IEEE Int Conf on Communications, p.1-6.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于多智能体强化学习的车载自组织网络协作信道分配

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference