JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Efficient learning of robust multigait quadruped locomotion for minimizing the cost of transport

Author(s): Zhicheng WANG, Xin ZHAO, Meng Yee (Michael) CHUAH, Zhibin LI, Jun WU, Qiuguo ZHU
Affiliation(s): Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): willw317@zju.edu.cn, pekingbit@163.com
Key Words: Reinforcement learning; Locomotion; Motor learning; Energy efficiency

Share this article to： More <<< Previous Paper \|Next Paper >>>

Zhicheng WANG, Xin ZHAO, Meng Yee (Michael) CHUAH, Zhibin LI, Jun WU, Qiuguo ZHU. Efficient learning of robust multigait quadruped locomotion for minimizing the cost of transport[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2401070

@article{title="Efficient learning of robust multigait quadruped locomotion for minimizing the cost of transport",
author="Zhicheng WANG, Xin ZHAO, Meng Yee (Michael) CHUAH, Zhibin LI, Jun WU, Qiuguo ZHU",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2401070"
}

%0 Journal Article
%T Efficient learning of robust multigait quadruped locomotion for minimizing the cost of transport
%A Zhicheng WANG
%A Xin ZHAO
%A Meng Yee (Michael) CHUAH
%A Zhibin LI
%A Jun WU
%A Qiuguo ZHU
%J Frontiers of Information Technology & Electronic Engineering
%P 1679-1691
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2401070"

TY - JOUR
T1 - Efficient learning of robust multigait quadruped locomotion for minimizing the cost of transport
A1 - Zhicheng WANG
A1 - Xin ZHAO
A1 - Meng Yee (Michael) CHUAH
A1 - Zhibin LI
A1 - Jun WU
A1 - Qiuguo ZHU
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1679
EP - 1691
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2401070"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Quadruped robots are able to exhibit a range of gaits, each with its own traversability and energy efficiency characteristics. By actively coordinating between gaits in different scenarios, energy-efficient and adaptive locomotion can be achieved. This study investigates the performances of learned energy-efficient policies for quadrupedal gaits under different commands. We propose a training–synthesizing framework that integrates learned gait-conditioned locomotion policies into an efficient multiskill locomotion policy. The resulting control policy achieves low-cost smooth switching and controllable gaits. Our results of the learned multiskill policy demonstrate seamless gait transitions while maintaining energy optimality across all commands.

面向运输代价最小化的鲁棒多步态四足机器人高效学习

王志成^1,2，赵欣³，Meng Yee (Michael) CHUAH²，Zhibin LI⁴，吴俊^1,5，朱秋国^1,5
¹浙江大学智能系统与控制研究所，中国杭州市，310027
²新加坡科技研究局资讯通信研究院，138642
³中国北方车辆研究所槐树岭国家重点实验室，中国北京市，100072
⁴伦敦大学学院计算机科学系，英国伦敦市
⁵浙江大学工业控制技术国家重点实验室，中国杭州市，310027
摘要：四足机器人能够使用多种步态节律进行移动，每种步态在地形通过性和能量效率方面具有不同特点。通过在不同环境下主动切换调整步态，四足机器人可以实现节能且适应性强的运动策略。本文探讨不同步态参数下基于强化学习的四足机器人运动策略的能量效率和地形通过能力，提出一种训练-整合框架，将习得的单步态运动策略整合为一个高效的多步态运动策略。所得到的控制策略实现了低成本的步态切换和可控的步态。实验结果表明，该多技能策略在保持能量最优的同时，能够实现平滑安全的步态过渡。

关键词组：强化学习；运动；运动学习；能量效率

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Acero F, Yuan K, Li ZB, 2022. Learning perceptual locomotion on uneven terrains using sparse visual observations. IEEE Robot Autom Lett, 7(4):8611-8618.

[2]Agarwal A, Kumar A, Malik J, et al., 2022. Legged locomotion in challenging terrains using egocentric vision. Proc 6^th Annual Conf on Robot Learning, p.403-415.

[3]Chen D, Zhou B, Koltun V, et al., 2019. Learning by cheating. Proc 3^rd Annual Conf on Robot Learning, p.66-75.

[4]di Carlo J, Wensing PM, Katz B, et al., 2018. Dynamic locomotion in the MIT Cheetah 3 through convex model-predictive control. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1-9.

[5]Fu ZP, Kumar A, Malik J, et al., 2021. Minimizing energy consumption leads to the emergence of gaits in legged robots. Proc 5^th Annual Conf on Robot Learning, p.928-937.

[6]Haynes GC, Rizzi AA, 2006. Gaits and gait transitions for legged robots. Proc IEEE Int Conf on Robotics and Automation, p.1117-1122.

[7]Hildebrand M, 1965. Symmetrical gaits of horses. Science, 150(3697):701-708.

[8]Hinton G, Vinyals O, Dean J, 2015. Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531

[9]Hoyt DF, Taylor CR, 1981. Gait and the energetics of locomotion in horses. Nature, 292(5820):239-240.

[10]Hsiao-Wecksler ET, Polk JD, Rosengren KS, et al., 2010. A review of new analytic techniques for quantifying symmetry in locomotion. Symmetry, 2(2):1135-1155.

[11]Hwangbo J, Lee J, Hutter M, 2018. Per-contact iteration method for solving contact dynamics. IEEE Robot Autom Lett, 3(2):895-902.

[12]Ijspeert AJ, 2008. Central pattern generators for locomotion control in animals and robots: a review. Neur Netw, 21(4):642-653.

[13]Iscen A, Caluwaerts K, Tan J, et al., 2018. Policies modulating trajectory generators. Proc 2^nd Annual Conf on Robot Learning, p.916-926.

[14]Jacobs RA, Jordan MI, Nowlan SJ, et al., 1991. Adaptive mixtures of local experts. Neur Comput, 3(1):79-87.

[15]Jin YB, Liu XW, Shao YC, et al., 2022. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nat Mach Intell, 4(12):1198-1208.

[16]Kumar A, Fu ZP, Pathak D, et al., 2021. RMA: rapid motor adaptation for legged robots. Proc 17^th Robotics: Science and Systems, p.1-9.

[17]Kumar A, Li Z, Zeng J, et al., 2022. Adapting rapid motor adaptation for bipedal robots. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1161-1168.

[18]Lee J, Hwangbo J, Hutter M, 2019. Robust recovery controller for a quadrupedal robot using deep reinforcement learning. https://arxiv.org/abs/1901.07517

[19]Lee J, Hwangbo J, Wellhausen L, et al., 2020. Learning quadrupedal locomotion over challenging terrain. Sci Robot, 5(47):eabc5986.

[20]Loquercio A, Kumar A, Malik J, 2023. Learning visual locomotion with cross-modal supervision. Proc Int Conf on Robotics and Automation, p.7295-7302.

[21]Luo YS, Soeseno JH, Chen TPC, et al., 2020. CARL: controllable agent with reinforcement learning for quadruped locomotion. ACM Trans Graph, 39(4):38.

[22]Makoviychuk V, Wawrzyniak L, Guo YR, et al., 2021. Isaac Gym: high performance GPU based physics simulation for robot learning. Proc 35^th Conf on Neural Information Processing Systems, p.1-12.

[23]Margolis GB, Agrawal P, 2022. Walk these ways: tuning robot control for generalization with multiplicity of behavior. Proc 6^th Annual Conf on Robot Learning, p.1-9.

[24]Margolis GB, Yang G, Paigwar K, et al., 2022. Rapid locomotion via reinforcement learning. Proc 18^th Robotics: Science and Systems, p.1-9.

[25]Miki T, Lee J, Hwangbo J, et al., 2022. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot, 7(62):eabk2822.

[26]Nahrendra IMA, Yu B, Myung H, 2023. DreamWaq: learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. Proc IEEE Int Conf on Robotics and Automation, p.5078-5084.

[27]Paszke A, Gross S, Massa F, et al., 2019. PyTorch: an imperative style, high-performance deep learning library. Proc 33^rd Int Conf on Neural Information Processing Systems, p.8024-8035.

[28]Peng XB, Chang M, Zhang G, et al., 2019. MCP: learning composable hierarchical control with multiplicative compositional policies. Proc 33^rd Int Conf on Neural Information Processing Systems, Article 331.

[29]Peng XB, Coumans E, Zhang TN, et al., 2020. Learning agile robotic locomotion skills by imitating animals. Proc 16^th Robotics: Science and Systems, p.1-9.

[30]Peng XB, Ma Z, Abbeel P, et al., 2021. AMP: adversarial motion priors for stylized physics-based character control. ACM Trans Graph, 40(4):144.

[31]Peng XB, Guo YR, Halper L, et al., 2022. ASE: large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Trans Graph, 41(4):94.

[32]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347

[33]Seok S, Wang A, Chuah MY, et al., 2013. Design principles for highly efficient quadrupeds and implementation on the MIT Cheetah robot. IEEE Int Conf on Robotics and Automation, p.3307-3312.

[34]Shao YC, Jin YB, Liu XW, et al., 2022. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robot Autom Lett, 7(2):1230-1237.

[35]Siekmann J, Valluri S, Dao J, et al., 2020. Learning memory-based control for human-scale bipedal locomotion. Proc 16^th Robotics: Science and Systems, p.1-8.

[36]Siekmann J, Green K, Warila J, et al., 2021a. Blind bipedal stair traversal via sim-to-real reinforcement learning. Proc 17^th Robotics: Science and Systems, p.1-9.

[37]Siekmann J, Godse Y, Fern A, et al., 2021b. Sim-to-real learning of all common bipedal gaits via periodic reward composition. Proc IEEE Int Conf on Robotics and Automation, p.7309-7315.

[38]Tan DCH, Zhang J, Chuah M, et al., 2023. Perceptive locomotion with controllable pace and natural gait transitions over uneven terrains. https://arxiv.org/abs/2301.10894

[39]Tan J, Zhang TN, Coumans E, et al., 2018. Sim-to-real: learning agile locomotion for quadruped robots. Proc 14^th Robotics: Science and Systems, p.1-9.

[40]Tsounis V, Alge M, Lee J, et al., 2020. DeepGait: planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robot Autom Lett, 5(2):3699-3706.

[41]Xi WT, Yesilevskiy Y, Remy CD, 2016. Selecting gaits for economical locomotion of legged robots. Int J Robot Res, 35(9):1140-1154.

[42]Xie ZM, Da XY, van de Panne M, et al., 2021. Dynamics randomization revisited: a case study for quadrupedal locomotion. Proc IEEE Int Conf on Robotics and Automation, p.4955-4961.

[43]Yang CY, Yuan K, Zhu QG, et al., 2020. Multi-expert learning of adaptive legged locomotion. Sci Robot, 5(49):eabb2174.

[44]Zhang H, Starke S, Komura T, et al., 2018. Mode-adaptive neural networks for quadruped motion control. ACM Trans Graph, 37(4):145.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

面向运输代价最小化的鲁棒多步态四足机器人高效学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference