JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C 2011 Vol.12 No.1 P.17-24

Convergence analysis of an incremental approach to online inverse reinforcement learning

Author(s): Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu
Affiliation(s): School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): jinzhuojun@zju.edu.cn, qianhui@zju.edu.cn
Key Words: Incremental approach, Reward recovering, Online learning, Inverse reinforcement learning, Markov decision process

Share this article to： More <<< Previous Article \|Next Article >>>

Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning[J]. Journal of Zhejiang University Science C, 2011, 12(1): 17-24.

@article{title="Convergence analysis of an incremental approach to online inverse reinforcement learning",
author="Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu",
journal="Journal of Zhejiang University Science C",
volume="12",
number="1",
pages="17-24",
year="2011",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1010010"
}

%0 Journal Article
%T Convergence analysis of an incremental approach to online inverse reinforcement learning
%A Zhuo-jun Jin
%A Hui Qian
%A Shen-yi Chen
%A Miao-liang Zhu
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 1
%P 17-24
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1010010

TY - JOUR
T1 - Convergence analysis of an incremental approach to online inverse reinforcement learning
A1 - Zhuo-jun Jin
A1 - Hui Qian
A1 - Shen-yi Chen
A1 - Miao-liang Zhu
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 1
SP - 17
EP - 24
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1010010

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abbeel, P.Y., Ng, A.Y., 2004. Apprenticeship Learning via Inverse Reinforcement Learning. 21st Int. Conf. on Machine Learning, p.1-8.

[2]Abbeel, P.Y., Coates, A., Quigley, M.Y., Ng, A., 2007. An Application of Reinforcement Learning to Aerobatic Helicopter Flight. Advances in Neural Information Processing Systems. MIT Press, Cambridge, McCallum, p.76-84.

[3]Abbeel, P.Y., Dolgov, D., Ng, A.Y., Thrun, S., 2008. Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.1083-1090.

[4]Chen, S.Y., Qian, H., Fan, J., Jin, Z.J., Zhu, M.L., 2010. Modified reward function on abstract features in inverse reinforcement learning. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(9):718-723.

[5]Kivinen, J., 2003. Online learning of linear classifiers. Adv. Lect. Mach. Learn., 26(1):235-258.

[6]Kolter, J.Z., Abbeel, P.Y., Ng, A., 2008. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Advances in Neural Information Processing Systems. MIT Press, Cambridge, UK, p.769-776.

[7]Lopes, M., Melo, F., Montesano, L., 2009. Active learning for reward estimation in inverse reinforcement learning. LNCS, 5782:31-46.

[8]Neu, G., Szepesvari, C., 2007. Apprenticeship Learning Using Inverse Reinforcement Learning and Gradient Methods. 23rd Conf. on Uncertainty in Artificial Intelligence, p.295-302.

[9]Ng, A., Russell, S., 2000. Algorithms for Inverse Reinforcement Learning. 17th Int. Conf. on Machine Learning, p.663-670.

[10]Ramachandran, D., Amir, E., 2007. Bayesian Inverse Reinforcement Learning. 20th Int. Joint Conf. on Artifical Intelligence, p.2586-2591.

[11]Ratliff, D.N., Bagnell, J.A., Zinkevich, M., 2006. Maximum Margin Planning. 23rd Int. Conf. on Machine Learning, p.729-736.

[12]Ratliff, D.N., Bagnell, J.A., Srinivasa, S.S., 2007. Imitation Learning for Locomotion and Manipulation. 7th IEEE-RAS Int. Conf. on Humanoid Robots, p.392-397.

[13]Russell, S., 1998. Learning Agents for Uncertain Environments. 11th Annual Conf. on Computational Learning Theory, p.101-103.

[14]Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, USA, p.51-86.

[15]Syed, U., Schapire, R.E., 2008. A Game-Theoretic Approach to Apprenticeship Learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, p.1449-1456.

[16]Syed, U., Bowling, M., Schapire, R.E., 2008. Apprenticeship Learning Using Linear Programming. 25th Int. Conf. on Machine Learning, p.1032-1039.

[17]Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K., 2008. Maximum Entropy Inverse Reinforcement Learning. 23rd National Conf. on Artificial Intelligence, p.1433-1438.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference