CLC number: TP13
On-line Access: 2022-03-22
Received: 2020-08-31
Revision Accepted: 2022-04-22
Crosschecked: 2021-01-10
Cited: 0
Clicked: 6413
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0001-6264-2955
Xinxing LI, Lele XI, Wenzhong ZHA, Zhihong PENG. Minimax Q-learning design for H∞ control of linear discrete-time systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2000446 @article{title="Minimax Q-learning design for H∞ control of linear discrete-time systems", %0 Journal Article TY - JOUR
线性离散时间系统H∞控制的极小极大Q-学习设计1中国电子科技集团公司信息科学研究院,中国北京市,100086 2北京理工大学自动化学院,中国北京市,100081 3鹏城实验室,中国深圳市,518052 摘要:H∞控制是一种消除系统扰动的有效方式,但是由于需要求解非线性哈密顿-雅克比-伊萨克斯方程,H∞控制器往往很难得到,即便对于线性系统。本文考虑了线性离散时间系统的H∞控制器设计问题。为求解涉及的博弈代数黎卡提方程,在离线策略算法基础上提出一种新型无模型极小极大Q-学习算法,并证明离线策略迭代算法是求解博弈代数黎卡提方程的牛顿法。提出的极小极大Q-学习算法采用离轨策略强化学习技术,利用行为策略产生的系统状态数据,可实现对最优控制器和最佳干扰策略的在线学习。不同于当前Q-学习算法,本文提出一种基于梯度的策略提高方法。证明在一定持续激励条件下,对于初始可行的控制策略并结合合适学习率,提出的极小极大Q-学习算法可收敛到鞍点策略。此外,算法收敛所需的持续激励条件可通过选择包含一定噪声激励的合适行为策略实现,且不会引起任何激励噪声偏差。将提出的极小极大Q-学习算法用于受负载扰动的电力系统H∞负载频率控制器设计,仿真结果表明,最终得到的H∞负载频率控制器具有良好抗干扰性能。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Al-Tamimi A, Lewis FL, Abu-Khalaf M, 2007. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 43(3):473-481. [2]Başar T, Bernhard P, 1995. H∞-Optimal Control and Related Minimax Design Problems (2nd Ed.). Springer, Boston, USA. [3]Doyle JC, Glover K, Khargonekar PP, et al., 1989. State-space solutions to standard H2 and H∞ control problems. IEEE Trans Autom Contr, 34(8):831-847. [4]Hansen TD, Miltersen PB, Zwick U, 2003. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J ACM, 60(1): Article 1. [5]He HB, Zhong XN, 2018. Learning without external reward. IEEE Comput Intell Mag, 13(3):48-54. [6]Ioannou PA, Fidan B, 2006. Adaptive Control Tutorial. SIAM, Philadelphia, USA. [7]Kiumarsi B, Lewis FL, Jiang ZP, 2017. H∞ control of linear discrete-time systems: off-policy reinforcement learning. Automatica, 78:144-152. [8]Kiumarsi B, Vamvoudakis KG, Modares H, et al., 2018. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neur Netw Learn Syst, 29(6):2042-2062. [9]Li HR, Zhang QC, Zhao DB, 2020. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans Neur Netw Learn Syst, 31(6):2064-2076. [10]Li XX, Peng ZH, Jiao L, et al., 2019. Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games. Inform Sci, 62:222201. [11]Littman ML, 2001. Value-function reinforcement learning in Markov games. Cogn Syst Res, 2(1):55-66. [12]Luo B, Wu HN, Huang TW, 2015. Off-policy reinforcement learning for H∞ control design. IEEE Trans Cybern, 45(1):65-76. [13]Luo B, Yang Y, Liu DR, 2018. Adaptive Q-learning for data-based optimal output regulation with experience replay. IEEE Trans Cybern, 48(12):3337-3348. [14]Luo B, Yang Y, Liu DR, 2021. Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Trans Cybern, 51(7):3630-3640. [15]Mehraeen S, Dierks T, Jagannathan S, et al., 2013. Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cybern, 43(6):1641-1655. [16]Modares H, Lewis FL, Jiang ZP, 2015. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neur Netw Learn Syst, 26(10):2550-2562. [17]Prokhorov DV, Wunsch DC, 1997. Adaptive critic designs. IEEE Trans Neur Netw, 8(5):997-1007. [18]Rizvi SAA, Lin ZL, 2018. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control. Automatica, 95:213-221. [19]Sakamoto N, van der Schaft AJ, 2008. Analytical approximation methods for the stabilizing solution of the Hamilton–Jacobi equation. IEEE Trans Autom Contr, 53(10):2335-2350. [20]Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, USA. [21]Valadbeigi AP, Sedigh AK, Lewis FL, 2020. H∞ static output-feedback control design for discrete-time systems using reinforcement learning. IEEE Trans Neur Netw Learn Syst, 31(2):396-406. [22]Vamvoudakis KG, Modares H, Kiumarsi B, et al., 2017. Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Contr Syst Mag, 37(1):33-52. [23]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3):279-292. [24]Wei QL, Lewis FL, Sun QY, et al., 2017. Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans Cybern, 47(5):1224-1237. [25]Wei YF, Wang ZY, Guo D, et al., 2019. Deep Q-learning based computation offloading strategy for mobile edge computing. Comput Mater Contin, 59(1):89-104. [26]Yan HS, Zhang JJ, Sun QM, 2019. MTN optimal control of SISO nonlinear time-varying discrete-time systems for tracking by output feedback. Intell Autom Soft Comput, 25(3):487-507. [27]Zhang HG, Qin CB, Jiang B, et al., 2014. Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern, 44(12):2706-2718. [28]Zhong XN, He HB, Wang D, et al., 2018. Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Trans Cybern, 48(5):1633-1646. [29]Zhu YH, Zhao DB, Li XJ, 2017. Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neur Netw Learn Syst, 28(3):714-725. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>