RL Walking¶
For learned walks the RLWalkingEngine
handles neuronal networks for generating walking patterns.
The currently learned walks were trained with the pipeline of Booster Robotics
1.
Network Structure Input¶
The expected neuronal network input is expected to be as follows:
- 3D Vektor (0,0,-1) rotated around the current torso orientation
- Sequence of x, y, z
- Gyro values of the IMU
- Sequence of x, y, z
- Walking speed
- Sequence of x, y, yaw
- When standing, these values should be 0
- Phase input
- Sequence cos, sin
- The trained frequency is in the range of [1,2]. It is recommended to use the middle value of 1.5
- The measured joint positions
- Sequence for Booster robots is (waistYaw), lHipPitch, lHipRoll, lHipYaw, lKneePitch, lAnklePitch, lAnkleRoll, rHipPitch, rHipRoll, rHipYaw, rKneePitch, rAnklePitch, rAnkleRoll
- Sequence for NAO robots is lHipYawPitch, lHipRoll, lHipPitch, lKneePitch, lAnklePitch, lAnkleRoll, rHipYawPitch, rHipRoll, rHipPitch, rKneePitch, rAnklePitch, rAnkleRoll
- A position offset is subtracted beforehand
- The measured joint velocities
- Sequence the same as joint positions
- Multiplied by factor 0.1
- The requested joint positions
- Sequence the same as joint positions
- A position offset is subtracted beforehand
If a frequency offset was trained, then there is an additional input:
- Frequency offset from last output
If a neuronal network, which can play the ball is used, then there is more additional input:
- 3D Ball positions
- Sequence x, y, z
- In meter
- Multiplied by factor 0.1
- 3D Ball velocity
- Sequence x, y, z
- In meter per seconds
- Multiplied by factor 0.1
- Kick direction
- Sequence sin(direction), cos(direction)
- Kick range
- In meter
- Multiplied by factor 0.1
- Flag ball playing
- 0 if false, 1 if true
- Flag Walking
- 0 if false, 1 if true
Neuronal Network output¶
The expected neuronal network output is as follows:
- The requested joint positions
- Same sequence as joint position Input
If a frequency offset was trained, then there is an additional output:
- Frequency offset
The joint positions are expected to be clipped and a position offset added. The frequency offset is also clipped and relative to the frequency 1.5. The raw offset (clipped into a wider range) is used as the input in the next cycle.
Behavior¶
As the walk is generated from a neuronal network, some safety checks are included to prevent damages to the robot and environment. When transitioning from PlayDead the leg target joints are set to 0 and interpolated to. When Standing the joint request is frozen after two seconds and if the state of the robot is ManualPenalized. When transitioning from Walking to Standing the robot is forced to walk in place for a short moment. From tests we observed the real robot would otherwise sometimes fallover or oscillate with the joints, which is prevented with this approach.
Learnings From Training¶
Below a short list of our current learnings and observations from training neuronal networks and testing them on the real robot:
- Learning an offset for the frequency results in a more stable walk with more stretched out legs.
- Setting simulation parameters as armature, damping and frictionloss result in a less stomping walk.
- Using lower D-parameters for the torque control during training than on the real robot reduces the occurance of oscillating joints.
- Ensuring the entropy loss does not become too negative during training helps the policy to not get stuck in local minima.
- Implementing a symmetrie loss helps in more symmetric motions
- The first initial training can be done with a lower number of robots (like 1024)
- ...
Current TODO¶
- The current motion framework calculates in walking steps, but the neuronal network is called every 20 ms and needs walking speeds.
- The robot can not stop all its movement within one step. This forces the robot to often overshoot its target position.
- Kicks are not fully supported yet.
- The
WalkKickEngine
does not work with theRLWalkingEngine
, because a new step is started every 20 ms, which confuses the motion framework. - On the Booster T1, the motor commands are send from the perception board, which results in a round trip delay of 14 ms. We currently assume this could result in some smaller oscillation effects.
-
Booster Robotics - https://github.com/BoosterRobotics/booster_gym/tree/main ↩