CLC number: TP183; TP393.1
On-line Access: 2021-05-17
Received: 2019-12-19
Revision Accepted: 2020-06-27
Crosschecked: 2020-10-20
Cited: 0
Clicked: 5237
Citations: Bibtex RefMan EndNote GB/T7714
Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang. Dynamic value iteration networks for the planning of rapidly changing UAV swarms[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 687-696.
@article{title="Dynamic value iteration networks for the planning of rapidly changing UAV swarms",
author="Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="5",
pages="687-696",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900712"
}
%0 Journal Article
%T Dynamic value iteration networks for the planning of rapidly changing UAV swarms
%A Wei Li
%A Bowei Yang
%A Guanghua Song
%A Xiaohong Jiang
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 5
%P 687-696
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900712
TY - JOUR
T1 - Dynamic value iteration networks for the planning of rapidly changing UAV swarms
A1 - Wei Li
A1 - Bowei Yang
A1 - Guanghua Song
A1 - Xiaohong Jiang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 5
SP - 687
EP - 696
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900712
Abstract: In an unmanned aerial vehicle ad-hoc network (UANET), sparse and rapidly mobile unmanned aerial vehicles (UAVs)/nodes can dynamically change the UANET topology. This may lead to UANET service performance issues. In this study, for planning rapidly changing UAV swarms, we propose a dynamic value iteration network (DVIN) model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function, which enables UAVs/nodes to adapt to novel physical locations. We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method. Simulation results demonstrate that the proposed model significantly reduces the decision-making time for UAV/node path planning with a high average success rate.
[1]Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265-283.
[2]Bekmezci I, Sahingoz OK, Temel Ş, 2013. Flying ad-hoc networks (FANETs): a survey. Ad Hoc Netw, 11(3):1254-1270.
[3]Bellman R, 1966. Dynamic programming. Science, 153(3731):34-37.
[4]Bertsekas DP, 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, USA.
[5]Boureau YL, Bach F, LeCun Y, et al., 2010. Learning mid-level features for recognition. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.2559-2566.
[6]Buck I, Foley T, Horn D, et al., 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph, 23(3):777-786.
[7]Challita U, Saad W, Bettstetter C, 2018. Deep reinforcement learning for interference-aware path planning of cellular-connected UAVs. Proc IEEE Int Conf on Communications, p.1-7.
[8]Cruz F, Wüppen P, Fazrie A, et al., 2019. Action selection methods in a robotic reinforcement learning scenario. Proc IEEE Latin American Conf on Computational Intelligence, p.1-6.
[9]Deb K, Pratap A, Agarwal S, et al., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput, 6(2):182-197.
[10]Fontes RR, 2019. Emulando Redes Sem Fio Com Mininet-WiFi. https://github.com/ramonfontes/mn-wifi-book-pt/blob/master/preview-book.pdf
[11]Fontes RR, Afzal S, Brito SHB, et al., 2015. Mininet-WiFi: emulating software-defined wireless networks. Proc 11th Int Conf on Network and Service Management, p.384-389.
[12]François-Lavet V, Henderson P, Islam R, et al., 2018. An introduction to deep reinforcement learning. Found Trends® Mach Learn, 11(3-4):219-354.
[13]Koohifar F, Kumbhar A, Guvenc I, 2017. Receding horizon multi-UAV cooperative tracking of moving RF source. IEEE Commun Lett, 21(6):1433-1436.
[14]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.
[15]Lee J, Kang BY, Kim DW, 2013. Fast genetic algorithm for robot path planning. Electron Lett, 49(23):1449-1451.
[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[17]Mnih V, Badia AP, Mirza L, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.
[18]Niu SF, Chen SH, Guo HY, et al., 2018. Generalized value iteration networks: life beyond lattices. Proc 32nd AAAI Conf on Artificial Intelligence, p.6246-6253.
[19]Roberge V, Tarbouchi M, Labonte G, 2013. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning. IEEE Trans Ind Inform, 9(1):132-141.
[20]Schaal S, 1999. Is imitation learning the route to humanoid robots? Trends Cogn Sci, 3(6):233-242.
[21]Tamar A, Wu Y, Thomas G, et al., 2017. Value iteration networks. Proc 26th Int Joint Conf on Artificial Intelligence, p.4949-4953.
[22]Tokic M, Palm G, 2011. Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Proc 34th Annual German Conf on Advances in Artificial Intelligence, p.335-346.
[23]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.
[24]Zhang CY, Patras P, Haddadi H, 2019. Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor, 21(3):2224-2287.
[25]Zhang T, Li Q, Zhang CS, et al., 2017. Current trends in the development of intelligent unmanned autonomous systems. Front Inform Technol Electron Eng, 18(1):68-85.
Open peer comments: Debate/Discuss/Question/Opinion
<1>