RL Walking¶

For learned walks the RLWalkingEngine handles neuronal networks for generating walking patterns. The currently learned walks were trained with the pipeline of Booster Robotics¹.

Network Structure Input¶

The expected neuronal network input is expected to be as follows:

3D Vektor (0,0,-1) rotated around the current torso orientation
- Sequence of x, y, z
Gyro values of the IMU
- Sequence of x, y, z
Walking speed
- Sequence of x, y, yaw
- When standing, these values should be 0
Phase input
- Sequence cos, sin
- The trained frequency is in the range of [1,2]. It is recommended to use the middle value of 1.5
The measured joint positions
- Sequence for Booster robots is (waistYaw), lHipPitch, lHipRoll, lHipYaw, lKneePitch, lAnklePitch, lAnkleRoll, rHipPitch, rHipRoll, rHipYaw, rKneePitch, rAnklePitch, rAnkleRoll
- Sequence for NAO robots is lHipYawPitch, lHipRoll, lHipPitch, lKneePitch, lAnklePitch, lAnkleRoll, rHipYawPitch, rHipRoll, rHipPitch, rKneePitch, rAnklePitch, rAnkleRoll
- A position offset is subtracted beforehand
The measured joint velocities
- Sequence the same as joint positions
- Multiplied by factor 0.1
The requested joint positions
- Sequence the same as joint positions
- A position offset is subtracted beforehand

If a frequency offset was trained, then there is an additional input:

Frequency offset from last output

If a neuronal network, which can play the ball is used, then there is more additional input:

3D Ball positions
- Sequence x, y, z
- In meter
- Multiplied by factor 0.1
3D Ball velocity
- Sequence x, y, z
- In meter per seconds
- Multiplied by factor 0.1
Kick direction
- Sequence sin(direction), cos(direction)
Kick range
- In meter
- Multiplied by factor 0.1
Flag ball playing
- 0 if false, 1 if true
Flag Walking
- 0 if false, 1 if true

Neuronal Network output¶

The expected neuronal network output is as follows:

The requested joint positions
- Same sequence as joint position Input

If a frequency offset was trained, then there is an additional output:

Frequency offset

The joint positions are expected to be clipped and a position offset added. The frequency offset is also clipped and relative to the frequency 1.5. The raw offset (clipped into a wider range) is used as the input in the next cycle.

Behavior¶

As the walk is generated from a neuronal network, some safety checks are included to prevent damages to the robot and environment. When transitioning from PlayDead the leg target joints are set to 0 and interpolated to. When Standing the joint request is frozen after two seconds and if the state of the robot is ManualPenalized. When transitioning from Walking to Standing the robot is forced to walk in place for a short moment. From tests we observed the real robot would otherwise sometimes fall over or oscillate with the joints, which is reduced with this approach.

Note

Note that using a GameController to set the robot state prevents setting the ManualPenalized state. We recommend to press the emergency button to unstiff the robot when carrying it short distances. Otherwise the walk policy will let the robot walk while it is getting carried, which is very dangerous!

Learnings From Training¶

Below a short list of our current learnings and observations from training neuronal networks and testing them on the real robot:

Learning an offset for the frequency results in a more stable walk with more stretched out legs.
Setting simulation parameters as armature, damping and frictionloss result in a less stomping walk.
Ensuring the entropy loss does not become too negative during training helps the policy to not get stuck in local minima.
Implementing a symmetrie loss helps in more symmetric motions
The first initial training can be done with a lower number of robots (like 1024)
...

Current TODO¶

The current motion framework calculates in walking steps, but the neuronal network is called every 20 ms and needs walking speeds.
The robot can not stop all its movement within one step. This forces the robot to often overshoot its target position.
Kicks are not fully supported yet.
The WalkKickEngine does not work with the RLWalkingEngine, because a new step is started every 20 ms, which confuses the motion framework.
On the Booster T1, the motor commands are send from the perception board, which results in a round trip delay of 14 ms. We currently assume this could result in some smaller oscillation effects.

Booster Robotics - https://github.com/BoosterRobotics/booster_gym/tree/main ↩

Last update: October 13, 2025