Walking

The goal of this training environment is to walk forward as fast as possible without falling over. It is based on the environment introduced by Tassa, Erez and Todorov in “Synthesis and stabilization of complex behaviors through online trajectory optimization”. The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of legs and arms, and a pair of tendons connecting the hips to the knees. The legs each consist of three body parts (thigh, shin, foot), and the arms consist of two body parts (upper arm, forearm).

Rewards

The total reward is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost

  • healthy_reward: Every timestep that the agent is alive, it gets a reward of fixed value healthy_reward.

  • forward_reward: A reward for moving forward, this reward would be positive if the agent moves forward.

  • ctrl_cost: A negative reward to penalize the agent for taking actions that are too large.

  • contact_cost: A negative reward to penalize the agent if the external contact forces are too large.

Arguments

Agent provides a range of parameters to modify the observation space, reward function, initial state, and termination condition.

Parameter
Default
Description

learning_rate

?

How fast the agent updates its understanding.

clip_range

?

How much the agent’s policy is allowed to change each update.

entropy coefficient

?

Controls how much the agent explores.

forward_reward_weight

1.25

Weight for forward_reward term (see Rewards section)

ctrl_cost_weight

0.1

Weight for ctrl_cost term (see Rewards section)

contact_cost_weight

5e-7

Weight for contact_cost term (see Rewards section)

contact_cost_range

(-np.inf, 10.0)

Clamps the contact_cost term (see Rewards section)

healthy_reward

5.0

Weight for healthy_reward term (see Rewards section)

Last updated