Walking
The goal of this training environment is to walk forward as fast as possible without falling over. It is based on the environment introduced by Tassa, Erez and Todorov in “Synthesis and stabilization of complex behaviors through online trajectory optimization”. The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of legs and arms, and a pair of tendons connecting the hips to the knees. The legs each consist of three body parts (thigh, shin, foot), and the arms consist of two body parts (upper arm, forearm).
Rewards
The total reward is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost
healthy_reward: Every timestep that the agent is alive, it gets a reward of fixed value
healthy_reward
.forward_reward: A reward for moving forward, this reward would be positive if the agent moves forward.
ctrl_cost: A negative reward to penalize the agent for taking actions that are too large.
contact_cost: A negative reward to penalize the agent if the external contact forces are too large.
Arguments
Agent provides a range of parameters to modify the observation space, reward function, initial state, and termination condition.
learning_rate
?
How fast the agent updates its understanding.
clip_range
?
How much the agent’s policy is allowed to change each update.
entropy coefficient
?
Controls how much the agent explores.
forward_reward_weight
1.25
Weight for forward_reward
term (see Rewards section)
ctrl_cost_weight
0.1
Weight for ctrl_cost
term (see Rewards section)
contact_cost_weight
5e-7
Weight for contact_cost
term (see Rewards section)
contact_cost_range
(-np.inf, 10.0)
Clamps the contact_cost
term (see Rewards section)
healthy_reward
5.0
Weight for healthy_reward
term (see Rewards section)
Last updated