JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Indirect adaptive fuzzy-regulated optimal control for unknown continuous-time nonlinear systems

Author(s): Haiyun Zhang, Deyuan Meng, Jin Wang, Guodong Lu
Affiliation(s): State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): gray_sun@zju.edu.cn, tinydreams@126.com, dwjcom@zju.edu.cn, lugd@zju.edu.cn
Key Words: Indirect adaptive optimal control, Hamilton-Jacobi-Bellman equation, Fuzzy-regulated critic, Adaptive optimal control actor, Actor-critic structure, Unknown nonlinear systems

Share this article to： More <<< Previous Paper \|Next Paper >>>

Haiyun Zhang, Deyuan Meng, Jin Wang, Guodong Lu. Indirect adaptive fuzzy-regulated optimal control for unknown continuous-time nonlinear systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900610

@article{title="Indirect adaptive fuzzy-regulated optimal control for unknown continuous-time nonlinear systems",
author="Haiyun Zhang, Deyuan Meng, Jin Wang, Guodong Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1900610"
}

%0 Journal Article
%T Indirect adaptive fuzzy-regulated optimal control for unknown continuous-time nonlinear systems
%A Haiyun Zhang
%A Deyuan Meng
%A Jin Wang
%A Guodong Lu
%J Frontiers of Information Technology & Electronic Engineering
%P 155-169
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1900610"

TY - JOUR
T1 - Indirect adaptive fuzzy-regulated optimal control for unknown continuous-time nonlinear systems
A1 - Haiyun Zhang
A1 - Deyuan Meng
A1 - Jin Wang
A1 - Guodong Lu
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 155
EP - 169
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1900610"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: We present a novel indirect adaptive fuzzy-regulated optimal control scheme for continuous-time nonlinear systems with unknown dynamics, mismatches, and disturbances. Initially, the Hamilton-Jacobi-Bellman (HJB) equation associated with its performance function is derived for the original nonlinear systems. Unlike existing adaptive dynamic programming (ADP) approaches, this scheme uses a special non-quadratic variable performance function as the reinforcement medium in the actor-critic architecture. An adaptive fuzzy-regulated critic structure is correspondingly constructed to configure the weighting matrix of the performance function for the purpose of approximating and balancing the HJB equation. A concurrent self-organizing learning technique is designed to adaptively update the critic weights. Based on this particular critic, an adaptive optimal feedback controller is developed as the actor with a new form of augmented Riccati equation to optimize the fuzzy-regulated variable performance function in real time. The result is an online indirect adaptive optimal control mechanism implemented as an actor-critic structure, which involves continuous-time adaptation of both the optimal cost and the optimal control policy. The convergence and closed-loop stability of the proposed system are proved and guaranteed. Simulation examples and comparisons show the effectiveness and advantages of the proposed method.

面向未知连续非线性系统的间接自适应模糊规划最优控制方法

张海运^1,2，孟德远²，王进¹，陆国栋¹
¹浙江大学流体动力与机电系统国家重点实验室，中国杭州市，310027
²中国矿业大学机械电子工程系，中国徐州市，221116

摘要：针对动力学未知、不匹配和扰动条件下的连续非线性系统，提出一种新的间接自适应模糊规划最优控制方案。首先，建立非线性系统汉密尔顿-雅各比-贝尔曼（HJB）方程及其匹配的性能函数。与现有自适应动态规划（ADP）方法不同，在执行器-评判器架构下，所提方案采用特殊的非二次变量性能函数作为强化媒介。构造一个自适应模糊规划的评判器结构来配置性能函数的权重矩阵，以逼近和平衡非线性HJB方程。同时，设计一种并行的自组织学习技术用于自适应更新该评判器的权重。在此基础上，提出一种自适应最优反馈控制器与一个新形式的增广黎卡提方程作为执行器，实时优化模糊规划后的性能函数。通过设计上述执行器-评判器架构获得一种在线间接自适应最优控制机制，可同时实现最优成本函数和最优控制策略的连续实时自适应调整。该方法的控制收敛性和闭环稳定性得到证明和保证。最后，仿真和比较表明所提方案的有效性和可靠性。

关键词组：间接自适应最优控制；汉密尔顿-雅各比-贝尔曼方程；模糊规划评判器；自适应最优控制执行器；执行器-评判器架构；未知非线性系统

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abu-Khalaf M, Lewis FL, 2005. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 41(5):779-791.

[2]Bhasin S, Kamalapurkar R, Johnson M, et al., 2013. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 49(1):82-92.

[3]Bian T, Jiang ZP, 2016. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 71:348-360.

[4]Chang XH, Yang C, Xiong J, 2019. Quantized fuzzy output feedback H_∞ control for nonlinear systems with adjustment of dynamic parameters. IEEE Trans Syst Man Cybern Syst, 49(10):2005-2015.

[5]Chang Y, Wang YQ, Alsaadi FE, et al., 2019. Adaptive fuzzy output-feedback tracking control for switched stochastic pure-feedback nonlinear systems. Int J Adapt Contr Signal Process, 33(10):1567-1582.

[6]Finlayson BA, 1990. The Method of Weighted Residuals and Variational Principles. Academic Press, New York, USA.

[7]Huo X, Ma L, Zhao XD, et al., 2020. Event-triggered adaptive fuzzy output feedback control of MIMO switched nonlinear systems with average dwell time. Appl Math Comput, 365:124665.

[8]Ioannou PA, Fidan B, 2006. Advances in Design and Control. Adaptive Control Tutorial. SIAM, Philadelphia, USA.

[9]Jiang Y, Jiang ZP, 2012. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 48(10):2699-2704.

[10]Jiang Y, Jiang ZP, 2014. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neur Netw Learn Syst, 25(5):882-893.

[11]Kiumarsi B, Lewis FL, Modares H, et al., 2014. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 50(4):1167-1175.

[12]Lee JM, Lee JH, 2004. Approximate dynamic programming strategies and their applicability for process control: a review and future directions. Int J Contr Autom Syst, 2(3):263-278.

[13]Lee JY, Park JB, Choi YH, 2012. Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica, 48(11):2850-2859.

[14]Lee JY, Park JB, Choi YH, 2015. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. IEEE Trans Neur Netw Learn Syst, 26(5):916-932.

[15]Lewis FL, Vrabie DL, Syrmos VL, 2012a. Optimal Control (3^rd Ed.). Wiley, Hoboken, USA.

[16]Lewis FL, Vrabie D, Vamvoudakis KG, 2012b. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Contr Syst Mag, 32(6):76-105.

[17]Li YM, Tong SC, Li TS, 2016. Hybrid fuzzy adaptive output feedback control design for uncertain MIMO nonlinear systems with time-varying delays and input saturation. IEEE Trans Fuzzy Syst, 24(4):841-853.

[18]Lin WS, 2011. Optimality and convergence of adaptive optimal control by reinforcement synthesis. Automatica, 47(5):1047-1052.

[19]Liu DR, Wei QL, 2013. Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern, 43(2):779-789.

[20]Liu DR, Yang X, Li HL, 2013. Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neur Comput Appl, 23(7):1843-1850.

[21]Liu DR, Wang D, Wang FY, et al., 2014. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Trans Cybern, 44(12):2834-2847.

[22]Ma L, Huo X, Zhao XD, et al., 2019. Adaptive fuzzy tracking control for a class of uncertain switched nonlinear systems with multiple constraints: a small-gain approach. Int J Fuzzy Syst, 21(8):2609-2624.

[23]Modares H, Lewis FL, 2014. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 50(7):1780-1792.

[24]Modares H, Naghibi Sistani MB, Lewis FL, 2013. A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Trans, 52(5):611-621.

[25]Murray JJ, Cox CJ, Lendaris GG, et al., 2002. Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C, 32(2):140-153.

[26]Padhi R, Unnikrishnan N, Wang XH, et al., 2006. A Single Network Adaptive Critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neur Netw, 19(10):1648-1660.

[27]Powell WB, 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York, USA.

[28]Sastry SS, 1999. Nonlinear Systems: Analysis, Stability, and Control. Springer-Verlag, New York, USA.

[29]Slotine JE, Li W, 1991. Applied Nonlinear Control. Prentice Hall, Englewood Cliffs, NJ, USA.

[30]Song RZ, Xiao WD, Zhang HG, et al., 2014. Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans Neur Netw Learn Syst, 25(9):1733-1739.

[31]Tao G, 2003. Adaptive Control Design and Analysis. In: Adaptive and Learning Systems for Signal Processing, Communications and Control Series. Wiley-Interscience, Hoboken, NJ, USA.

[32]Vamvoudakis KG, 2017. Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst Contr Lett, 100:14-20.

[33]Vamvoudakis KG, Lewis FL, 2010. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5):878-888.

[34]van der Schaft AJ, 1992. L₂-gain analysis of nonlinear systems and nonlinear state-feedback H₁ control. IEEE Trans Autom Contr, 37(6):770-784.

[35]Vrabie D, Pastravanu O, Abu-Khalaf M, et al., 2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2):477-484.

[36]Wang FY, Zhang HG, Liu DR, 2009. Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag, 4(2):39-47.

[37]Wei QL, Zhang HG, Dai J, 2009. Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing, 72(8-9):1839-1848.

[38]Werbos P, 2004. ADP: goals, opportunities and principles. In: Si J, Barto A, Powell W, et al. (Eds.), Handbook of Learning and Approximate Dynamic Programming. Institute of Electrical and Electronics Engineers, New York, USA, p.3-44.

[39]Yang X, He HB, 2018. Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neur Netw, 99:19-30.

[40]Yang X, Liu DR, Luo B, et al., 2016. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inform Sci, 369:731-747.

[41]Yang XY, Liu DR, Huang YZ, 2013. Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints. IET Contr Theory Appl, 7(17):2037-2047.

[42]Yin YF, Zhao XD, Zheng XL, 2017. New stability and stabilization conditions of switched systems with mode-dependent average dwell time. Circ Syst Signal Process, 36(1):82-98.

[43]Yu ZX, Yang YK, Li SG, et al., 2018. Observer-based adaptive finite-time quantized tracking control of nonstrict-feedback nonlinear systems with asymmetric actuator saturation. IEEE Trans Syst Man Cyber Syst, 50(11):545-4556.

[44]Zak SH, 2003. Systems and Control. Oxford University Press, New York, USA.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

面向未知连续非线性系统的间接自适应模糊规划最优控制方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference