Reinforcement Learning for Quadrupedal Locomotion

Training robust walking policies for the Unitree Go2 robot using Proximal Policy Optimization (PPO)


Ubuntu IsaacLab PyTorch wandb


Link to Github Repo

Demo

walking

Description

  • Developed a high-fidelity locomotion policy for the Unitree Go2 in Isaac Lab, leveraging Proximal Policy Optimization (PPO) to train robust gait controllers
  • Implemented a custom actuator friction model incorporating static and viscous friction terms into the low-level PD control loop, significantly narrowing the sim-to-real gap by explicitly compensating for hardware non-linearities
  • Integrated Raibert Heuristics gait shaping and dynamic foot clearance constraints into the reward formulation, ensuring stable trotting gaits capable of precisely tracking planar velocity commands under stochastic joint disturbances