Full Text:  <2525>

Summary:  <1357>

CLC number: TP183; TP393.1

On-line Access: 2021-05-17

Received: 2019-12-19

Revision Accepted: 2020-06-27

Crosschecked: 2020-10-20

Cited: 0

Clicked: 4519

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Wei Li

https://orcid.org/0000-0001-5102-8597

Bowei Yang

https://orcid.org/0000-0001-8581-3817

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


Dynamic value iteration networks for the planning of rapidly changing UAV swarms


Author(s):  Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang

Affiliation(s):  School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):  li2ui2@zju.edu.cn, boweiy@zju.edu.cn, ghsong@zju.edu.cn, jiangxh@zju.edu.cn

Key Words:  Dynamic value iteration networks, Episodic Q-learning, Unmanned aerial vehicle (UAV) ad-hoc network, Non-dominated sorting genetic algorithm II (NSGA-II), Path planning


Share this article to: More <<< Previous Paper|Next Paper >>>

Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang. Dynamic value iteration networks for the planning of rapidly changing UAV swarms[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900712

@article{title="Dynamic value iteration networks for the planning of rapidly changing UAV swarms",
author="Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1900712"
}

%0 Journal Article
%T Dynamic value iteration networks for the planning of rapidly changing UAV swarms
%A Wei Li
%A Bowei Yang
%A Guanghua Song
%A Xiaohong Jiang
%J Frontiers of Information Technology & Electronic Engineering
%P 687-696
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1900712"

TY - JOUR
T1 - Dynamic value iteration networks for the planning of rapidly changing UAV swarms
A1 - Wei Li
A1 - Bowei Yang
A1 - Guanghua Song
A1 - Xiaohong Jiang
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 687
EP - 696
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1900712"


Abstract: 
In an unmanned aerial vehicle ad-hoc network (UANET), sparse and rapidly mobile unmanned aerial vehicles (UAVs)/nodes can dynamically change the UANET topology. This may lead to UANET service performance issues. In this study, for planning rapidly changing UAV swarms, we propose a dynamic value iteration network (DVIN) model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function, which enables UAVs/nodes to adapt to novel physical locations. We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method. Simulation results demonstrate that the proposed model significantly reduces the decision-making time for UAV/node path planning with a high average success rate.

用于规划快速变化无人机群的动态值迭代网络

李伟1,杨波威1,宋广华1,姜晓红2
1浙江大学航空航天学院,中国杭州市,310027
2浙江大学计算机科学与技术学院,中国杭州市,310027

摘要:在无人机自组网(UANET)中,稀疏且高速移动的无人机节点会动态改变无人机自组网的拓扑结构,这可能会导致无人机自组网服务性能问题。为规划快速变化的无人机群,本文提出一种动态值迭代网络(DVIN)模型,该模型利用无人机自组网的连接信息,采用场景式Q学习方法训练,生成状态值传播函数,使无人机节点能够自适应调节至新的物理位置。然后,评估了动态值迭代网络模型的性能,并将其与非支配排序遗传算法NSGA-II和穷举法比较。仿真结果表明,动态值迭代网络模型显著缩短了无人机节点路径规划的决策时间,且平均成功率更高。

关键词组:动态值迭代网络;场景式Q学习;无人机自组网;NSGA-II;路径规划

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265-283.

[2]Bekmezci I, Sahingoz OK, Temel Ş, 2013. Flying ad-hoc networks (FANETs): a survey. Ad Hoc Netw, 11(3):1254-1270.

[3]Bellman R, 1966. Dynamic programming. Science, 153(3731):34-37.

[4]Bertsekas DP, 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, USA.

[5]Boureau YL, Bach F, LeCun Y, et al., 2010. Learning mid-level features for recognition. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.2559-2566.

[6]Buck I, Foley T, Horn D, et al., 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph, 23(3):777-786.

[7]Challita U, Saad W, Bettstetter C, 2018. Deep reinforcement learning for interference-aware path planning of cellular-connected UAVs. Proc IEEE Int Conf on Communications, p.1-7.

[8]Cruz F, Wüppen P, Fazrie A, et al., 2019. Action selection methods in a robotic reinforcement learning scenario. Proc IEEE Latin American Conf on Computational Intelligence, p.1-6.

[9]Deb K, Pratap A, Agarwal S, et al., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput, 6(2):182-197.

[10]Fontes RR, 2019. Emulando Redes Sem Fio Com Mininet-WiFi. https://github.com/ramonfontes/mn-wifi-book-pt/blob/master/preview-book.pdf

[11]Fontes RR, Afzal S, Brito SHB, et al., 2015. Mininet-WiFi: emulating software-defined wireless networks. Proc 11th Int Conf on Network and Service Management, p.384-389.

[12]François-Lavet V, Henderson P, Islam R, et al., 2018. An introduction to deep reinforcement learning. Found Trends® Mach Learn, 11(3-4):219-354.

[13]Koohifar F, Kumbhar A, Guvenc I, 2017. Receding horizon multi-UAV cooperative tracking of moving RF source. IEEE Commun Lett, 21(6):1433-1436.

[14]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.

[15]Lee J, Kang BY, Kim DW, 2013. Fast genetic algorithm for robot path planning. Electron Lett, 49(23):1449-1451.

[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[17]Mnih V, Badia AP, Mirza L, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.

[18]Niu SF, Chen SH, Guo HY, et al., 2018. Generalized value iteration networks: life beyond lattices. Proc 32nd AAAI Conf on Artificial Intelligence, p.6246-6253.

[19]Roberge V, Tarbouchi M, Labonte G, 2013. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning. IEEE Trans Ind Inform, 9(1):132-141.

[20]Schaal S, 1999. Is imitation learning the route to humanoid robots? Trends Cogn Sci, 3(6):233-242.

[21]Tamar A, Wu Y, Thomas G, et al., 2017. Value iteration networks. Proc 26th Int Joint Conf on Artificial Intelligence, p.4949-4953.

[22]Tokic M, Palm G, 2011. Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Proc 34th Annual German Conf on Advances in Artificial Intelligence, p.335-346.

[23]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.

[24]Zhang CY, Patras P, Haddadi H, 2019. Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor, 21(3):2224-2287.

[25]Zhang T, Li Q, Zhang CS, et al., 2017. Current trends in the development of intelligent unmanned autonomous systems. Front Inform Technol Electron Eng, 18(1):68-85.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE