JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Author(s): Li-dong Zhang, Ban Wang, Zhi-xiang Liu, You-min Zhang, Jian-liang Ai
Affiliation(s): Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China; more
Corresponding email(s): lidongzhang13@fudan.edu.cn, ymzhang@encs.concordia.ca
Key Words: Reinforcement learning, Approximate dynamic programming, Decision making, Motion planning, Unmanned aerial vehicle

Share this article to： More <<< Previous Paper \|Next Paper >>>

Li-dong Zhang, Ban Wang, Zhi-xiang Liu, You-min Zhang, Jian-liang Ai. Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1800571

@article{title="Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method",
author="Li-dong Zhang, Ban Wang, Zhi-xiang Liu, You-min Zhang, Jian-liang Ai",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1800571"
}

%0 Journal Article
%T Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method
%A Li-dong Zhang
%A Ban Wang
%A Zhi-xiang Liu
%A You-min Zhang
%A Jian-liang Ai
%J Frontiers of Information Technology & Electronic Engineering
%P 525-537
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1800571"

TY - JOUR
T1 - Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method
A1 - Li-dong Zhang
A1 - Ban Wang
A1 - Zhi-xiang Liu
A1 - You-min Zhang
A1 - Jian-liang Ai
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 525
EP - 537
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1800571"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties, long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the three-dimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles (NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.

四旋翼无人机在博弈中的运动规划研究：一种基于仿真的投影策略迭代方法

摘要：数十年来，如何实现在复杂环境中对序列决策问题做出有效合理的决策始终是一个困扰各领域研究者的难题。该决策问题包含状态转移动力学模型、随机因素引入的不确定性、长远决策前沿优化问题以及其他许多难题，包括维数灾难在内的诸多困难使解决这一决策问题的有效方法仍待进一步研究探索。目前，随着增强学习领域不断开发出先进算法，为尝试解决复杂环境中的序列决策问题提供了有潜力的解决方案，并可在实际应用环境中获得较高决策性能。本文提出三维空间中速度可变的一对一四旋翼无人机博弈研究平台，以及一种基于投影策略迭代方法的近似动态规划方法，以学习四旋翼博弈过程中效用函数并生成改进的四旋翼运动决策策略。此外，采用基于仿真的方法，消除维数灾难束缚。仿真结果表明，所提决策方法可在博弈对抗中高效生成有效运动策略，并在与对方四旋翼无人机对抗中能获取并保持有利态势。在肯高迪亚大学网络自动车辆（NAV）实验室进行的飞行实验进一步验证了所提决策方法在实时环境中的决策性能。

关键词组：增强学习；近似动态规划；决策；运动规划；无人机

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Ballard BW, 1983. The •-minimax search procedure for trees containing chance nodes. Artif Intell, 21(3):327-350.

[2]Bellman R, 1952. On the theory of dynamic programming. Proc Nat Acad Sci, 38(8):716-719.

[3]Bertsekas DP, 1971. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, Massachusetts, USA.

[4]Bertsekas DP, 2007. Dynamic Programming and Optimal Control (3^rd Ed.) Athena Scientific, Belmont, Massachusetts, USA.

[5]Bertsekas DP, 2011. Temporal difference methods for general projected equations. IEEE Trans Autom Contr, 56(9):2128-2139.

[6]Bertsekas DP, 2012. Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Suvrit Sra SN, Wright SJ (Eds.), Optimization for Machine Learning. MIT Press, Massachusetts, USA.

[7]Bertsekas DP, 2015. Lambda-policy iteration: a review and a new implementation. https://arxiv.org/abs/1507.01029

[8]Bertsekas DP, Tsitsiklis JN, 2000. Gradient convergence in gradient methods with errors. SIAM J Optim, 10(3):linebreak627-642.

[9]Buc{s}oniu L, Ernst D, de Schutter B, et al., 2010. Online least-squares policy iteration for reinforcement learning control. Proc American Control Conf, p.486-491.

[10]Efroni Y, Dalal G, Scherrer B, et al., 2018a. Beyond the one step greedy approach in reinforcement learning. https://arxiv.org/abs/1802.03654

[11]Efroni Y, Dalal G, Scherrer B, et al., 2018b. Multiple-step greedy policies in online and approximate reinforcement learning. https://arxiv.org/abs/1805.07956

[12]Fang J, Zhang LM, Fang W, et al., 2016. Approximate dynamic programming for CGF air combat maneuvering decision. 2^nd IEEE Int Conf on Computer and Communications, p.1386-1390.

[13]Ghamry KA, Dong YQ, Kamel MA, et al., 2016. Real-time autonomous take-off, tracking and landing of UAV on a moving UGV platform. 24^th Mediterranean Conf on Control and Automation, p.1236-1241.

[14]Hastie T, Tibshirani R, Friedman J, 2001. The Elements of Statistical Learning. Springer, New York, USA.

[15]Hauk T, Buro M, Schaeffer J, 2004. Rediscovering •-minimax search. Int Conf on Computers and Games, p.35-50.

[16]Liu ZX, Zhang YM, Yu X, et al., 2016. Unmanned surface vehicles: an overview of developments and challenges. Ann Rev Contr, 41:71-93.

[17]Ma YF, Ma XL, Song X, 2014. A case study on air combat decision using approximated dynamic programming. Math Probl Eng, 2014:183401.

[18]McGrew JS, 2008. Real-Time Maneuvering Decisions for Autonomous Air Combat. MS Thesis, Massachusetts Institute of Technology, Massachusetts, USA.

[19]McGrew JS, How JP, Williams B, et al., 2010. Air-combat strategy using approximate dynamic programming. J Guid Contr Dynam, 33(5):1641-1654.

[20]Powell WB, 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, New Jersey, USA.

[21]Russell SJ, Norvig P, 2010. Artificial Intelligence: a Modern Approach (3^rd Ed.). Prentice Hall, New Jersey, USA.

[22]Sharifi F, Chamseddine A, Mahboubi H, et al., 2016. A distributed deployment strategy for a network of cooperative autonomous vehicles. IEEE Trans Contr Syst Technol, 23(2):737-745.

[23]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Massachusetts, USA.

[24]Thiery C, Scherrer B, 2010. Least-squares ^λ policy iteration: bias-variance trade-off in control problems. Proc 27^th Int Conf on Machine Learning, p.1071-1078.

[25]Wang B, Zhang YM, 2018. An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults. IEEE Trans Ind Electron, 65(5):4227-4236.

[26]Wang B, Yu X, Mu LX, et al., 2019. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech Syst Signal Process, 120:727-743.

[27]Yu HZ, 2010. Convergence of least squares temporal difference methods under general conditions. 27^th Int Conf on Machine Learning, p.1207-1214.

[28]Yu HZ, 2012. Least squares temporal difference methods: an analysis under general conditions. SIAM J Contr Optim, 50(6):3310-3343.

[29]Yuan C, Zhang YM, Liu ZX, 2015. A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can J Forest Res, 45(7):783-792.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

四旋翼无人机在博弈中的运动规划研究：一种基于仿真的投影策略迭代方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference