Full Text:   <200>

Summary:  <50>

CLC number: TP183; TP273

On-line Access: 2020-05-18

Received: 2019-11-22

Revision Accepted: 2020-02-24

Crosschecked: 2020-04-27

Cited: 0

Clicked: 509

Citations:  Bibtex RefMan EndNote GB/T7714


Huan Hu


Qing-ling Wang


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2020 Vol.21 No.5 P.777-795


Proximal policy optimization with an integral compensator for quadrotor control

Author(s):  Huan Hu, Qing-ling Wang

Affiliation(s):  School of Automation, Southeast University, Nanjing 210096, China

Corresponding email(s):   qlwang@seu.edu.cn

Key Words:  Reinforcement learning, Proximal policy optimization, Quadrotor control, Neural network

Huan Hu, Qing-ling Wang. Proximal policy optimization with an integral compensator for quadrotor control[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(5): 777-795.

@article{title="Proximal policy optimization with an integral compensator for quadrotor control",
author="Huan Hu, Qing-ling Wang",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Proximal policy optimization with an integral compensator for quadrotor control
%A Huan Hu
%A Qing-ling Wang
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 5
%P 777-795
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900641

T1 - Proximal policy optimization with an integral compensator for quadrotor control
A1 - Huan Hu
A1 - Qing-ling Wang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 5
SP - 777
EP - 795
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900641

We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.





Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265-283.

[2]Alexis K, Nikolakopoulos G, Tzes A, 2012. Model predictive quadrotor control: attitude, altitude and position experimental studies. IET Contr Theory Appl, 6(12):1812-1827.

[3]Amari SI, 1998. Natural gradient works efficiently in learning. Neur Comput, 10(2):251-276.

[4]Antonelli G, Cataldi E, Arrichiello F, et al., 2018. Adaptive trajectory tracking for quadrotor MAVs in presence of parameter uncertainties and external disturbances. IEEE Trans Contr Syst Technol, 26(1):248-254.

[5]Bobtsov A, Guirik A, Budko M, et al., 2016. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. Proc 8th Int Congress on Ultra Modern Telecommunications and Control Systems and Workshops, p.1-4.

[6]Bouabdallah S, Noth A, Siegwart R, 2004. PID vs LQ control techniques applied to an indoor micro quadrotor. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2451-2456.

[7]Dierks T, Jagannathan S, 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Trans Neur Netw, 21(1):50-66.

[8]Duan Y, Chen X, Houthooft R, et al., 2016. Benchmarking deep reinforcement learning for continuous control. Proc 33rd Int Conf on Machine Learning, p.1329-1338.

[9]Fumagalli M, Naldi R, Macchelli A, et al., 2012. Modeling and control of a flying robot for contact inspection. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3532-3537.

[10]Hwangbo J, Sa I, Siegwart R, et al., 2017. Control of a quadrotor with reinforcement learning. IEEE Robot Autom Lett, 2(4):2096-2103.

[11]Kakade S, Langford J, 2002. Approximately optimal approximate reinforcement learning. Proc 19th Int Conf on Machine Learning, p.267-274.

[12]Kingma DP, Ba J, 2014. ADAM: a method for stochastic optimization. https://arxiv.org/abs/1412.6980

[13]Lee T, 2013. Robust adaptive attitude tracking on SO(3) with an application to a quadrotor UAV. IEEE Trans Contr Syst Technol, 21(5):1924-1930.

[14]Lillicrap TP, Hunt JJ, Pritzel A, et al., 2016. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971

[15]Miglino O, Lund HH, Nolfi S, 1995. Evolving mobile robots in simulated and real environments. Artif Life, 2(4):417-434.

[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[17]Quanser, 2015. User Manual Qball 2 for QUARC: Set Up and Configuration. Quanser, Inc., Markham, ON, Canada.

[18]Rozi HA, Susanto E, Dwibawa IP, 2017. Quadrotor model with proportional derivative controller. Proc Int Conf on Control, Electronics, Renewable Energy and Communications, p.241-246.

[19]Salih AL, Moghavvemi M, Mohamed HAF, et al., 2010. Flight PID controller design for a UAV quadrotor. Sci Res Essays, 5(23):3660-3667.

[20]Santoso F, Garratt MA, Anavatti SG, 2018. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng, 15(2):613-627.

[21]Schulman J, 2016. Optimizing Expectations: from Deep Reinforcement Learning to Stochastic Computation Graphs. PhD Thesis, University of California, Berkeley, USA.

[22]Schulman J, Levine S, Moritz P, et al., 2015. Trust region policy optimization. Proc 31st Int Conf on Machine Learning, p.1889-1897.

[23]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347

[24]Shi DJ, Dai XH, Zhang XW, et al., 2017. A practical performance evaluation method for electric multicopters. IEEE/ASME Trans Mechatr, 22(3):1337-1348.

[25]Silver D, Lever G, Heess N, et al., 2014. Deterministic policy gradient algorithms. Proc 31st Int Conf on Machine Learning, p.1-9.

[26]Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489.

[27]Sutton RS, 1995. Generalization in reinforcement learning: successful examples using sparse coarse coding. Proc 8th Int Conf on Neural Information Processing Systems, p.1038-1044.

[28]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, USA.

[29]Tomic T, Schmid K, Lutz P, et al., 2012. Toward a fully autonomous UAV: research platform for indoor and outdoor urban search and rescue. IEEE Robot Autom Mag, 19(3): 46-56.

[30]Valente J, del Cerro J, Barrientos A, et al., 2013. Aerial coverage optimization in precision agriculture management: a musical harmony inspired approach. Comput Electron Agric, 99:153-159.

[31]Valenti RG, Jian YD, Ni K, et al., 2016. An autonomous flyer photographer. Proc IEEE Int Conf on Cyber Technology in Automation, Control, and Intelligent Systems, p.273- 278.

[32]van Hasselt H, 2010. Double Q-learning. Proc 23rd Int Conf on Neural Information Processing Systems, p.2613-2621.

[33]van Hasselt H, Guez A, Silver D, 2016. Deep reinforcement learning with double Q-learning. Proc 30th AAAI Conf on Artificial Intelligence, p.2094-2100.

[34]Wang YD, Sun J, He HB, et al., 2019. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst, p.1-13.

[35]Waslander SL, Hoffmann GM, Jang JS, et al., 2005. Multi- agent quadrotor testbed control design: integral sliding mode vs. reinforcement learning. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3712-3717.

[36]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.

[37]Williams-Hayes PS, 2005. Flight test implementation of a second generation intelligent flight control system. Proc Infotech@Aerospace, p.26-29.

[38]Xu B, 2018. Composite learning finite-time control with application to quadrotors. IEEE Trans Syst Man Cybern Syst, 48(10):1806-1815.

[39]Xu R, Ozguner U, 2006. Sliding mode control of a quadrotor helicopter. Proc 45th IEEE Conf on Decision and Control, p.4957-4962.

[40]Yang HJ, Cheng L, Xia YQ, et al., 2018. Active disturbance rejection attitude control for a dual closed-loop quadrotor under gust wind. IEEE Trans Contr Syst Technol, 26(4): 1400-1405.

[41]Yechiel O, Guterman H, 2017. A survey of adaptive control. Int Rob Autom J, 3(2):290-292.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE