AILIVE Documentation
  • Getting Started
    • Introduction
    • The Foundation of AI: Machine Learning
    • Vision
    • $AILIVE Token
    • Technology
    • TEE Integration
    • Open Source Development
  • Phases
    • I: The Awakening
    • II: The Creations
    • III: Open World
    • Practical Consciousness
    • Mobile App Launch
  • Skills
    • Library
    • Crawling
    • Walking
    • Standing Up
    • Speaking
    • Running
    • Skating
    • Fighting
    • Dating
    • Breeding
Powered by GitBook
On this page
  • Rewards
  • Arguments
Export as PDF
  1. Skills

Walking

PreviousCrawlingNextStanding Up

Last updated 4 months ago

The goal of this training environment is to walk forward as fast as possible without falling over. It is based on the environment introduced by Tassa, Erez and Todorov in . The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a pair of legs and arms, and a pair of tendons connecting the hips to the knees. The legs each consist of three body parts (thigh, shin, foot), and the arms consist of two body parts (upper arm, forearm).

Rewards

The total reward is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost

  • healthy_reward: Every timestep that the agent is alive, it gets a reward of fixed value healthy_reward.

  • forward_reward: A reward for moving forward, this reward would be positive if the agent moves forward.

  • ctrl_cost: A negative reward to penalize the agent for taking actions that are too large.

  • contact_cost: A negative reward to penalize the agent if the external contact forces are too large.

Arguments

Agent provides a range of parameters to modify the observation space, reward function, initial state, and termination condition.

Parameter
Default
Description

learning_rate

?

How fast the agent updates its understanding.

clip_range

?

How much the agent’s policy is allowed to change each update.

entropy coefficient

?

Controls how much the agent explores.

forward_reward_weight

1.25

Weight for forward_reward term (see Rewards section)

ctrl_cost_weight

0.1

Weight for ctrl_cost term (see Rewards section)

contact_cost_weight

5e-7

Weight for contact_cost term (see Rewards section)

contact_cost_range

(-np.inf, 10.0)

Clamps the contact_cost term (see Rewards section)

healthy_reward

5.0

Weight for healthy_reward term (see Rewards section)

“Synthesis and stabilization of complex behaviors through online trajectory optimization”