CLC number: TP242
On-line Access: 2019-05-14
Received: 2018-09-15
Revision Accepted: 2018-11-27
Crosschecked: 2019-04-28
Cited: 0
Clicked: 5884
Li-dong Zhang, Ban Wang, Zhi-xiang Liu, You-min Zhang, Jian-liang Ai. Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1800571 @article{title="Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method", %0 Journal Article TY - JOUR
四旋翼无人机在博弈中的运动规划研究:一种基于仿真的投影策略迭代方法关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ballard BW, 1983. The •-minimax search procedure for trees containing chance nodes. Artif Intell, 21(3):327-350. [2]Bellman R, 1952. On the theory of dynamic programming. Proc Nat Acad Sci, 38(8):716-719. [3]Bertsekas DP, 1971. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, Massachusetts, USA. [4]Bertsekas DP, 2007. Dynamic Programming and Optimal Control (3rd Ed.) Athena Scientific, Belmont, Massachusetts, USA. [5]Bertsekas DP, 2011. Temporal difference methods for general projected equations. IEEE Trans Autom Contr, 56(9):2128-2139. [6]Bertsekas DP, 2012. Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Suvrit Sra SN, Wright SJ (Eds.), Optimization for Machine Learning. MIT Press, Massachusetts, USA. [7]Bertsekas DP, 2015. Lambda-policy iteration: a review and a new implementation. https://arxiv.org/abs/1507.01029 [8]Bertsekas DP, Tsitsiklis JN, 2000. Gradient convergence in gradient methods with errors. SIAM J Optim, 10(3):linebreak627-642. [9]Buc{s}oniu L, Ernst D, de Schutter B, et al., 2010. Online least-squares policy iteration for reinforcement learning control. Proc American Control Conf, p.486-491. [10]Efroni Y, Dalal G, Scherrer B, et al., 2018a. Beyond the one step greedy approach in reinforcement learning. https://arxiv.org/abs/1802.03654 [11]Efroni Y, Dalal G, Scherrer B, et al., 2018b. Multiple-step greedy policies in online and approximate reinforcement learning. https://arxiv.org/abs/1805.07956 [12]Fang J, Zhang LM, Fang W, et al., 2016. Approximate dynamic programming for CGF air combat maneuvering decision. 2nd IEEE Int Conf on Computer and Communications, p.1386-1390. [13]Ghamry KA, Dong YQ, Kamel MA, et al., 2016. Real-time autonomous take-off, tracking and landing of UAV on a moving UGV platform. 24th Mediterranean Conf on Control and Automation, p.1236-1241. [14]Hastie T, Tibshirani R, Friedman J, 2001. The Elements of Statistical Learning. Springer, New York, USA. [15]Hauk T, Buro M, Schaeffer J, 2004. Rediscovering •-minimax search. Int Conf on Computers and Games, p.35-50. [16]Liu ZX, Zhang YM, Yu X, et al., 2016. Unmanned surface vehicles: an overview of developments and challenges. Ann Rev Contr, 41:71-93. [17]Ma YF, Ma XL, Song X, 2014. A case study on air combat decision using approximated dynamic programming. Math Probl Eng, 2014:183401. [18]McGrew JS, 2008. Real-Time Maneuvering Decisions for Autonomous Air Combat. MS Thesis, Massachusetts Institute of Technology, Massachusetts, USA. [19]McGrew JS, How JP, Williams B, et al., 2010. Air-combat strategy using approximate dynamic programming. J Guid Contr Dynam, 33(5):1641-1654. [20]Powell WB, 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, New Jersey, USA. [21]Russell SJ, Norvig P, 2010. Artificial Intelligence: a Modern Approach (3rd Ed.). Prentice Hall, New Jersey, USA. [22]Sharifi F, Chamseddine A, Mahboubi H, et al., 2016. A distributed deployment strategy for a network of cooperative autonomous vehicles. IEEE Trans Contr Syst Technol, 23(2):737-745. [23]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Massachusetts, USA. [24]Thiery C, Scherrer B, 2010. Least-squares λ policy iteration: bias-variance trade-off in control problems. Proc 27th Int Conf on Machine Learning, p.1071-1078. [25]Wang B, Zhang YM, 2018. An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults. IEEE Trans Ind Electron, 65(5):4227-4236. [26]Wang B, Yu X, Mu LX, et al., 2019. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech Syst Signal Process, 120:727-743. [27]Yu HZ, 2010. Convergence of least squares temporal difference methods under general conditions. 27th Int Conf on Machine Learning, p.1207-1214. [28]Yu HZ, 2012. Least squares temporal difference methods: an analysis under general conditions. SIAM J Contr Optim, 50(6):3310-3343. [29]Yuan C, Zhang YM, Liu ZX, 2015. A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can J Forest Res, 45(7):783-792. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>