Skating

The goal of this training environment is to teach the agent to ride a skateboard efficiently and perform basic maneuvers without falling. Skating is a highly dynamic task that combines balance, coordination, and precision. This environment builds on the agent's foundational locomotion skills and introduces the added complexity of maintaining stability on a moving platform (the skateboard).

The skating agent is a bipedal humanoid with a torso, legs, arms, and a simulated skateboard. The agent must learn to shift its weight, control its limbs, and manage the skateboard's motion to maintain balance and achieve forward motion.


Rewards

The total reward function ensures the agent learns effective skating techniques while penalizing unsafe or inefficient behaviors. The reward function is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost - balance_penalty - trick_penalty

  • healthy_reward: A fixed reward for each timestep the agent remains on the skateboard without falling.

  • forward_reward: A positive reward proportional to the skateboard's forward velocity, encouraging efficient movement.

  • ctrl_cost: A penalty for excessive or wasteful limb movements while controlling the skateboard.

  • contact_cost: A penalty for excessive force during foot or hand contact with the ground or skateboard.

  • balance_penalty: A penalty for tipping or wobbling excessively while skating.

  • trick_penalty: A penalty for failed tricks or unnecessary risky maneuvers.


Challenges

  1. Dynamic Balance: The agent must manage its center of gravity to stay upright on a moving skateboard.

  2. Weight Shifting: Learning to shift weight between legs and adjust posture for steering and acceleration.

  3. Obstacle Navigation: Avoiding or maneuvering around obstacles while maintaining speed and balance.


Arguments

Parameter

Default

Description

learning_rate

3e-4

Determines how quickly the agent updates its policy during skating training.

clip_range

0.2

Limits the magnitude of policy updates to ensure stable learning.

entropy_coefficient

0.02

Encourages exploration of new skating techniques and movements.

forward_reward_weight

2.0

Weight for the forward_reward, incentivizing faster skating speeds.

ctrl_cost_weight

0.05

Penalizes inefficient or erratic limb movements during skating.

contact_cost_weight

1e-6

Penalizes abrupt or heavy contact with the ground or skateboard.

contact_cost_range

(-np.inf, 15.0)

Clamps the contact_cost term to prevent runaway penalties.

healthy_reward

10.0

Fixed reward for maintaining balance and avoiding falls.

balance_penalty_weight

0.1

Penalizes excessive wobbling or instability while riding the skateboard.

trick_penalty_weight

0.05

Penalizes failed or unnecessary tricks during the learning phase.

skateboard_friction

0.7

Coefficient of friction between the skateboard and ground, affecting speed and stability.

obstacle_density

low

Determines the frequency of obstacles in the environment (e.g., low, medium, high).

target_speed

3.0 m/s

Goal speed for the agent to reach and sustain during skating.

termination_penalty

-30.0

Penalty for falling off the skateboard or losing forward motion.

terrain_type

flat

Defines the type of surface the skateboard rolls on (e.g., flat, inclined, uneven).

Last updated