Standing Up

The goal of this training environment is to teach the agent to stand upright steadily and maintain balance. Standing up is a critical intermediate skill that bridges basic crawling with more advanced locomotion tasks like walking and running. This environment focuses on stability, energy efficiency, and postural control, laying the foundation for dynamic motion.

The standing agent consists of a torso (representing the body) and two legs, each with joints at the hips, knees, and ankles. These joints must work together to counteract gravity and stabilize the agent in an upright position. The environment replicates the biomechanical challenges of standing, including shifts in center of gravity and external perturbations.


Rewards

The total reward function ensures that the agent learns to stand effectively while penalizing inefficient or unstable behaviors. The reward function is: reward = healthy_reward + balance_reward - ctrl_cost - contact_cost

  • healthy_reward: A fixed reward for each timestep the agent maintains an upright posture.

  • balance_reward: A positive reward proportional to the agent’s ability to minimize sway and remain stable.

  • ctrl_cost: A penalty for large or inefficient joint movements that waste energy.

  • contact_cost: A penalty for excessive or abrupt forces exerted on the ground, promoting smooth adjustments.


Challenges

  1. Maintaining Balance: The agent must coordinate multiple joints to stay upright without tipping over.

  2. Energy Efficiency: Penalizing unnecessary movements ensures the agent learns to maintain posture using minimal effort.

  3. Initial Transition: Moving from a crouched or unstable starting position to standing upright introduces dynamic stabilization challenges.

Parameter

Default

Description

learning_rate

3e-4

How quickly the agent updates its policy to improve standing stability.

clip_range

0.2

Limits the magnitude of policy updates to avoid destabilizing changes during training.

entropy_coefficient

0.01

Encourages exploration of different standing strategies to find the most stable posture.

balance_reward_weight

1.0

Weight for the balance_reward, encouraging the agent to maintain an upright position.

stability_penalty_weight

0.05

Penalizes excessive sway or instability in the agent’s posture.

ctrl_cost_weight

0.03

Penalizes unnecessary or excessive joint movements while attempting to stand still.

contact_cost_weight

1e-6

Penalizes external forces applied excessively to the ground, ensuring smoother transitions.

contact_cost_range

(-np.inf, 5.0)

Clamps the contact_cost term to prevent large penalties from disrupting learning.

healthy_reward

10.0

Fixed reward for every timestep the agent successfully stands upright without tipping over.

initial_state_range

(±0.2)

Randomizes the initial position and orientation to challenge the agent’s ability to adapt.

ground_friction

0.9

Higher coefficient of friction to simulate a stable standing surface.

termination_penalty

-20.0

Penalty for falling or failing to maintain an upright position for a designated period.

reward_decay

0.99

Discount factor for future rewards, promoting long-term balance over quick fixes.

standing_time_goal

10 seconds

The target time the agent must stand still to successfully complete the task.

posture_tolerance

5 degrees

Defines the maximum tilt angle (in any direction) allowable while still considering the agent upright.

Last updated