Fighting
The goal of this training environment is to teach the agent to run forward as fast and efficiently as possible without falling. Running is a dynamic and high-energy task that requires precise coordination, balance, and rapid decision-making. This environment builds on the foundational skills of standing and walking, challenging the agent to master high-speed locomotion while maintaining stability.
The running agent simulates a bipedal robot with a torso, arms, and legs, each equipped with multiple joints (hips, knees, and ankles). The environment emphasizes fluid motion, energy efficiency, and adaptability to changes in terrain and speed demands.
Rewards
The total reward function ensures the agent learns to run effectively while penalizing inefficient or unsafe behaviors. The reward function is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost - balance_penalty
healthy_reward: A fixed reward for each timestep the agent remains in a healthy, upright state.
forward_reward: A positive reward proportional to the agent's forward velocity, encouraging faster running speeds.
ctrl_cost: A penalty for excessive or wasteful limb movements, promoting energy efficiency.
contact_cost: A penalty for large external forces during ground contact, encouraging smooth strides.
balance_penalty: A penalty for excessive swaying or tipping, ensuring stability at high speeds.
Challenges
High-Speed Coordination: The agent must synchronize limb movements to maintain balance while achieving maximum velocity.
Energy Optimization: Efficient use of joint movements to minimize energy expenditure while running.
Terrain Adaptability: Adjusting stride patterns to maintain speed and stability on varying surfaces.
Arguments
Parameter
Default
Description
learning_rate
3e-4
Determines how quickly the agent updates its running policy during training.
clip_range
0.2
Limits the magnitude of policy updates to ensure stability during training.
entropy_coefficient
0.01
Encourages exploration of different running styles and patterns.
forward_reward_weight
2.0
Weight for the forward_reward, incentivizing faster running speeds.
ctrl_cost_weight
0.1
Penalizes excessive or inefficient control actions, encouraging smooth motion.
contact_cost_weight
1e-6
Penalizes abrupt or heavy ground contact to promote fluid running strides.
contact_cost_range
(-np.inf, 15.0)
Clamps the contact_cost term to prevent runaway penalties.
healthy_reward
5.0
Fixed reward for maintaining a healthy, upright state.
balance_penalty_weight
0.05
Penalizes excessive tipping or instability while running.
stride_length_limit
1.5 meters
Maximum allowable stride length to encourage natural running motions.
terrain_type
flat
Defines the type of surface the agent runs on (e.g., flat, inclined, uneven).
initial_speed
1.0 m/s
Starting speed for the agent, gradually increasing during training.
termination_penalty
-30.0
Penalty for falling or failing to maintain forward motion.
reward_decay
0.99
Discount factor for future rewards, promoting consistent forward motion.
target_speed
5.0 m/s
Goal speed for the agent to reach and sustain during the task.
Last updated