JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE A

Accepted manuscript available online (unedited version)

A learning-based control pipeline for generic motor skills for quadruped robots

Author(s): Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG
Affiliation(s): Center for X-Mechanics, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): htw@zju.edu.cn
Key Words: Quadruped robot; Reinforcement learning (RL); Motion synthesis; Control

Share this article to： More \|Next Paper >>>

Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG. A learning-based control pipeline for generic motor skills for quadruped robots[J]. Journal of Zhejiang University Science A,in press.Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/jzus.A2300128

@article{title="A learning-based control pipeline for generic motor skills for quadruped robots",
author="Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG",
journal="Journal of Zhejiang University Science A",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/jzus.A2300128"
}

%0 Journal Article
%T A learning-based control pipeline for generic motor skills for quadruped robots
%A Yecheng SHAO
%A Yongbin JIN
%A Zhilong HUANG
%A Hongtao WANG
%A Wei YANG
%J Journal of Zhejiang University SCIENCE A
%P 443-454
%@ 1673-565X
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/jzus.A2300128"

TY - JOUR
T1 - A learning-based control pipeline for generic motor skills for quadruped robots
A1 - Yecheng SHAO
A1 - Yongbin JIN
A1 - Zhilong HUANG
A1 - Hongtao WANG
A1 - Wei YANG
J0 - Journal of Zhejiang University Science A
SP - 443
EP - 454
%@ 1673-565X
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/jzus.A2300128"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots. While motion imitation-based reinforcement learning (RL) has shown remarkable performance in reproducing designed motor skills, the trained controller is only suitable for one specific type of motion. Motion synthesis has been well developed to generate a variety of different motions for character animation, but those motions only contain kinematic information and cannot be used for control. In this study, we introduce a control pipeline combining motion synthesis and motion imitation-based RL for generic motor skills. We design an animation state machine to synthesize motion from various sources and feed the generated kinematic reference trajectory to the RL controller as part of the input. With the proposed method, we show that a single policy is able to learn various motor skills simultaneously. Further, we notice the ability of the policy to uncover the correlations lurking behind the reference motions to improve control performance. We analyze this ability based on the predictability of the reference trajectory and use the quantified measurements to optimize the design of the controller. To demonstrate the effectiveness of our method, we deploy the trained policy on hardware and, with a single control policy, the quadruped robot can perform various learned skills, including automatic gait transitions, high kick, and forward jump.

基于学习的四足机器人通用技能控制方法

作者：邵烨程^1,2，金永斌^1,2，黄志龙⁴，王宏涛^1,2,3，杨卫^1,2
机构：¹浙江大学，交叉力学中心，中国杭州，310027；²浙江大学，杭州国际科创中心，中国杭州，311200；³浙江大学，流体动力与机电系统国家重点实验室，中国杭州，310058；⁴浙江大学，应用力学研究所，中国杭州，310027
目的：控制四足机器人实现连续、可控的多种运动。
创新点：1.将动作生成与基于动作模仿的强化学习方法结合，使用同一个控制器，跟踪不同运动学轨迹，在实物机器人上实现步态切换、高抬腿和跳跃等不同动作。2.提出参考轨迹可预测性的概念，强化学习控制器具备挖掘参考轨迹内在关联性的能力，揭示动作模仿中控制器输入的参考轨迹长度对控制器性能的影响机理。
方法：1.通过动作捕获、草绘与轨迹优化等方法，建立运动轨迹数据库。2.通过基于动作模仿的强化方法，在仿真环境中训练控制器模仿数据库中的动作。3.基于控制器设计动作状态机，根据用户指令实时生成可控的运动轨迹，作为控制器的输入，实现对实物机器人的控制。4.提出参考轨迹可预测性的概念，分析参考轨迹长度对控制器性能的影响。
结论：1.本文所提出的控制方法可以在实物机器人上实现对多种技能的控制。2.参考轨迹长度对控制器性能的影响是通过可预测性实现的；对于可预测性低的运动，可以通过补充参考轨迹长度提高控制器性能。

关键词组：四足机器人；强化学习；动作生成；控制

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]AgarwalA, KumarA, MalikJ, et al., 2022. Legged locomotion in challenging terrains using egocentric vision. Proceedings of the 6th Conference on Robot Learning, p.403-415.

[2]ClavetS, 2016. Motion matching and the road to next-gen animation. Game Developers Conference.

[3]DaoJ, DuanHL, GreenK, et al., 2021. Pushing the limits: running at 3.2 m/s on cassie. Dynamic Walking Meeting.

[4]EscontrelaA, PengXB, YuWH, et al., 2022. Adversarial motion priors make good substitutes for complex reward functions. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.25-32.

[5]FuchiokaY, XieZM, van de PanneM, 2023. Opt-mimic: imitation of optimized trajectories for dynamic quadruped behaviors. International Conference on Robotics and Automation.

[6]HillA, RaffinA, ErnestusM, et al., 2018. Stable baselines. GitHub. https://github.com/hill-a/stable-baselines

[7]HoldenD, KomuraT, SaitoJ, 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics, 36(4):42.

[8]HoldenD, KanounO, PerepichkaM, et al., 2020. Learned motion matching. ACM Transactions on Graphics, 39(4):53.

[9]HuangXY, LiZY, XiangYZ, et al., 2022. Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning. arXiv:2210.04435. https://arxiv.org/abs/2210.04435

[10]HwangboJ, LeeJ, HutterM, 2018. Per-contact iteration method for solving contact dynamics. IEEE Robotics and Automation Letters, 3(2):895-902.

[11]JiG, MunJ, KimH, et al., 2022. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters, 7(2):4630-4637.

[12]JinYB, LiuXW, ShaoYC, et al., 2022. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence, 4(12):1198-1208.

[13]KangD, ZimmermannS, CorosS, 2021. Animal gaits on quadrupedal robots using motion matching and model-based control. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.8500-8507.

[14]LeeJ, HwangboJ, WellhausenL, et al., 2020. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986.

[15]LiCH, VlastelicaM, BlaesS, et al., 2022. Learning agile skills via adversarial imitation of rough partial demonstrations. Proceedings of the 6th Conference on Robot Learning, p.342-352.

[16]LingHY, ZinnoF, ChengG, et al., 2020. Character controllers using motion VAEs. ACM Transactions on Graphics, 39(4):40.

[17]MikiT, LeeJ, HwangboJ, et al., 2022. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822.

[18]PengXB, AbbeelP, LevineS, et al., 2018. DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics, 37(4):143.

[19]PengXB, ChangM, ZhangG, et al., 2019. MCP: learning composable hierarchical control with multiplicative compositional policies. Proceedings of the 33rd International Conference on Neural Information Processing Systems, article 331.

[20]PengXB, CoumansE, ZhangTN, et al., 2020. Learning agile robotic locomotion skills by imitating animals. Proceedings of the 14th Robotics: Science and Systems XVI.

[21]PengXB, MaZ, AbbeelP, et al., 2021. AMP: adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics, 40(4):144.

[22]PengXB, GuoYR, HalperL, et al., 2022. ASE: large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions on Graphics, 41(4):94.

[23]SchulmanJ, WolskiF, DhariwalP, et al., 2017. Proximal policy optimization algorithms. arXiv:1707.06347. https://arxiv.org/abs/1707.06347

[24]ShaoYS, JinYB, LiuXW, et al., 2022. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 7(2):1230-1237.

[25]SiekmannJ, ValluriS, DaoJ, et al., 2020. Learning memory-based control for human-scale bipedal locomotion. Proceedings of the 14th Robotics: Science and Systems XVI.

[26]SiekmannJ, GreenK, WarilaJ, et al., 2021a. Blind bipedal stair traversal via sim-to-real reinforcement learning. Proceedings of the 14th Robotics: Science and Systems XVII.

[27]SiekmannJ, GodseY, FernA, et al., 2021b. Sim-to-real learning of all common bipedal gaits via periodic reward composition. IEEE International Conference on Robotics and Automation, p.7309-7315.

[28]StarkeS, ZhangH, KomuraT, et al., 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics, 38(6):209.

[29]StarkeS, MasonI, KomuraT, 2022. DeepPhase: periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics, 41(4):136.

[30]VollenweiderE, BjelonicM, KlemmV, et al., 2022. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv:2203.14912. https://arxiv.org/abs/2203.14912

[31]XieZM, ClaryP, DaoJ, et al., 2019. Learning locomotion skills for cassie: iterative design and sim-to-real. Proceedings of the 3rd Annual Conference on Robot Learning, p.317-329.

[32]ZhangH, StarkeS, KomuraT, et al., 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics, 37(4):145.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于学习的四足机器人通用技能控制方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference