Crawling
The goal of this training environment is to teach the agent to crawl forward efficiently while maintaining balance and minimizing energy expenditure. Crawling is the foundational movement skill for more advanced locomotion tasks, such as walking and running. This environment emphasizes coordination, stability, and controlled movements using the agent's limbs and body.
The crawling agent consists of a torso (representing the abdomen) and four limbs (representing arms and legs). Each limb is made up of two body parts (upper and lower segments) connected by joints. This simplified anatomy mimics the movement mechanics of a quadruped crawl.
Rewards
The total reward function ensures that the agent learns to crawl effectively while avoiding inefficient or erratic behaviors. The reward function is: reward = healthy_reward + forward_reward - ctrl_cost - contact_cost
healthy_reward: A fixed reward for every timestep the agent maintains a healthy state (not falling or behaving unnaturally).
forward_reward: A positive reward proportional to the agent's forward velocity, encouraging efficient crawling motion.
ctrl_cost: A penalty to discourage excessive or energy-wasting limb movements.
contact_cost: A penalty for generating large external forces (e.g., slamming limbs into the ground), promoting smooth and controlled crawling.
Challenges
Balance Control: The agent must coordinate all four limbs and its torso to avoid tipping over while crawling.
Energy Efficiency: Minimizing unnecessary movements and penalizing erratic actions ensure the agent learns to crawl with minimal energy use.
Ground Interaction: The agent must learn to apply optimal force to the ground for propulsion without causing instability or excess friction.
Arguments
The agent provides several parameters that can be adjusted to tailor the crawling environment:
Parameter
Default
Description
learning_rate
3e-4
How fast the agent updates its policy using gradient descent during training.
clip_range
0.2
Limits the magnitude of policy updates to ensure stable learning.
entropy_coefficient
0.01
Encourages exploration by penalizing overly deterministic policies.
forward_reward_weight
1.0
Weight for the forward_reward term, which rewards the agent for moving forward.
ctrl_cost_weight
0.05
Weight for the ctrl_cost term, penalizing large or inefficient control actions.
contact_cost_weight
1e-6
Weight for the contact_cost term, penalizing excessive external forces during movement.
contact_cost_range
(-np.inf, 10.0)
Clamps the contact_cost term within this range to prevent runaway penalties.
healthy_reward
5.0
Fixed reward for each timestep the agent remains in a "healthy" state (e.g., not falling).
initial_state_range
(±0.1)
Range for randomizing the initial positions and velocities of the agent's limbs.
ground_friction
0.8
Coefficient of friction between the agent and the ground, affecting grip and movement efficiency.
termination_penalty
-10.0
Penalty applied if the agent tips over or fails to meet forward movement criteria.
reward_decay
0.99
Discount factor for future rewards, balancing short-term gains and long-term planning.
Crawling is a critical stage in the agent's motor skill development, laying the groundwork for mastering more complex movements like standing, walking, and running. By starting with crawling, the agent builds the coordination and stability necessary to progress through AILIVE’s dynamic training environments.
Last updated