CLC number: TP181; U495
On-line Access: 2021-05-17
Received: 2019-11-20
Revision Accepted: 2020-12-29
Crosschecked: 2021-02-03
Cited: 0
Clicked: 5311
Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900637 @article{title="Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving", %0 Journal Article TY - JOUR
面向强化学习自动驾驶模型的异步监督学习预训练方法北京航空航天大学交通科学与工程学院,大数据科学与脑机智能高精尖创新中心,中国北京市,100191 摘要:基于人定规则所设计的自动驾驶系统可能会因大规模相互耦合的规则而变得越来越复杂,因此许多研究人员致力于探索基于学习的解决方案。强化学习(reinforcement learning,RL)因其在各种顺序控制问题上的出色表现而被应用于自动驾驶系统设计。然而,基于RL的自动驾驶系统落地应用所面临的主要挑战是其初始性能不佳。强化学习训练需要大量训练数据,然后模型才能达到合理的性能要求,这使得基于强化学习的模型不适用于现实环境,尤其在数据昂贵的情况下。本文为基于强化学习的端到端自动驾驶模型提出一种异步监督学习(asynchronous supervised learning,ASL)方法,以解决在实际环境中训练基于强化学习模型时初始性能差的问题。具体而言,通过在多个驾驶演示数据集上并行且异步执行多个监督学习过程,在异步监督学习预训练阶段引入先验知识。经过预训练后,模型将被部署到真实车辆上进一步开展强化学习训练,以适应实际环境并不断突破性能极限。本文在赛车模拟器TORCS(The Open Racing Car Simulator)上对所提出的预训练方法进行评估,以验证该方法在改善强化学习训练阶段端到端自动驾驶模型的初始性能和收敛速度方面足够可靠。此外,建立一个实车验证系统,以验证所提预训练方法在实车部署中的可行性。仿真结果表明,在有监督的预训练阶段使用一些演示,可以显著提高强化学习训练阶段的初始性能和收敛速度。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Bai ZW, Shangguan W, Cai BG, et al., 2019. Deep reinforcement learning based high-level driving behavior decision-making model in heterogeneous traffic. Proc Chinese Control Conf, p.8600-8605. [2]Bojarski M, Del Testa D, Dworakowski D, et al., 2016. End to end learning for self-driving cars. https://arxiv.org/abs/1604.07316 [3]Brys T, Harutyunyan A, Suay HB, et al., 2015. Reinforcement learning from demonstration through shaping. Proc 24th Int Conf on Artificial Intelligence, p.3352-3358. [4]Chen CY, Seff A, Kornhauser A, et al., 2015. DeepDriving: learning affordance for direct perception in autonomous driving. Proc IEEE Int Conf on Computer Vision, p.2722-2730. [5]Chen JY, Yuan BD, Tomizuka M, 2019. Model-free deep reinforcement learning for urban autonomous driving. Proc IEEE Intelligent Transportation Systems Conf, p.2765-2771. [6]Codevilla F, Müller M, López A, et al., 2018. End-to-end driving via conditional imitation learning. Proc IEEE Int Conf on Robotics and Automation, p.4693-4700. [7]de la Cruz GVJr, Du YS, Taylor ME, 2019. Pre-training with non-expert human demonstration for deep reinforcement learning. Knowl Eng Rev, 34:e10. [8]González D, Pérez J, Milanés V, et al., 2016. A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst, 17(4):1135-1145. [9]Hao W, Lin YJ, Cheng Y, et al., 2018. Signal progression model for long arterial: intersection grouping and coordination. IEEE Access, 6:30128-30136. [10]He KM, Sun J, 2015. Convolutional neural networks at constrained time cost. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5353-5360. [11]He Y, Zhao N, Yin HX, 2018. Integrated networking, caching, and computing for connected vehicles: a deep reinforcement learning approach. IEEE Trans Veh Technol, 67(1):44-55. [12]Li L, Lv YS, Wang FY, 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin, 3(3):247-254. [13]Li LZ, Ota K, Dong MX, 2018. Humanlike driving: empirical decision-making system for autonomous vehicles. IEEE Trans Veh Technol, 67(8):6814-6823. [14]Liu N, Li Z, Xu JL, et al., 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. Proc IEEE 37th Int Conf on Distributed Computing Systems, p.372-382. [15]Mao HZ, Alizadeh M, Menache I, et al., 2016. Resource management with deep reinforcement learning. Proc 15th ACM Workshop on Hot Topics in Networks, p.50-56. [16]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602 [17]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533. [18]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937. [19]Nair A, Srinivasan P, Blackwell S, et al., 2015. Massively parallel methods for deep reinforcement learning. https://arxiv.org/abs/1507.04296 [20]Nair A, McGrew B, Andrychowicz M, et al., 2018. Overcoming exploration in reinforcement learning with demonstrations. https://arxiv.org/abs/1709.10089 [21]Paden B, Čáp M, Yong SZ, et al., 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans Intell Veh, 1(1):33-55. [22]Qiu CR, Hu Y, Chen Y, et al., 2019. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Int Things J, 6(5):8577-8588. [23]Sallab AE, Abdou M, Perot E, et al., 2017. Deep reinforcement learning framework for autonomous driving. Electron Imag, 2017(19):70-76. [24]Schwarting W, Alonso-Mora J, Rus D, 2018. Planning and decision-making for autonomous vehicles. Ann Rev Contr Robot Auton Syst, 1:187-210. [25]Selvaraju RR, Cogswell M, Das A, et al., 2019. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis, 128(8):336-359. [26]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359. [27]Taylor ME, Stone P, 2009. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res, 10:1633-1685. [28]Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047-1058. [29]Xu ZY, Wang YZ, Tang J, et al., 2017. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs. Proc IEEE Int Conf on Communications, p.1-6. [30]Zhang XQ, Ma HM, 2018. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. https://arxiv.org/abs/1801.10459 [31]Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921-2929. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>