v1.5.0: Refactor RL environments #143

stephane-caron · 2023-09-29T16:16:26Z

stephane-caron
Sep 29, 2023
Maintainer

This release starts rolling out changes to RL environments, along with quality-of-life improvements to the startup and build processes. One of them: agents can now retry connecting to the spine several times at startup, getting us rid of clunky timeouts 😉

RL changes and migration notes

RL environments now work with $R(o, a)$ reward functions. The refactoring adds an intermediate abstract class for environments that control Upkie in the wheeled inverted pendulum, and additional acceleration limits to the UpkieGroundVelocity environment (formerly UpkieWheelsEnv).

The refactoring also introduces a regulate_frequency boolean argument: the proper way not to regulate frequency is now regulate_frequency=False rather than frequency=None.

Enjoy these changes, and chime in in the Discussions if you have feedback 😃

Migration notes

Instead of frequency=None, set regulate_frequency=False to disable loop frequency regulation (e.g. for training)
Rename UpkieServosEnv to UpkieServos
Rename UpkieWheelsEnv to UpkieGroundVelocity

Changelog

Added

Bazel: Python library target for upkie module
PPO balancer: Save policy configuration to a YAML file
PPO balancer: Train with multiprocessing
envs: Accessor to env.rate for logging purposes
envs: Randomize base orientation and position on resets
envs: InitRandomization dataclass to describe initial state randomization
envs: UpkieGroundVelocity can include a velocity low-pass filter
envs: UpkieGroundVelocity can limit ground acceleration as well
envs: UpkieGroundVelocity low-pass filter can also be randomized
utils: Log path utility functions

Changed

Breaking: Environment rewards depend on both observation and action
Breaking: Rename UpkieServosEnv to UpkieServos
Breaking: Rename UpkieWheelsEnv to UpkieGroundVelocity
Breaking: Use regulate_frequency env kwarg instead of frequency=None
Makefile: Default wheel balancer config to the output of hostname
Makefile: Rename ROBOT environment variable to UPKIE_NAME
PPO balancer: Change training directory to /tmp/ppo_balancer
PPO balancer: Policy CLI argument becomes positional and optional
PPO balancer: Refactor agent settings
PPO balancer: Rename effective_time_horizon to discounted_horizon_duration
agents: Retry connecting to the spine several times at startup
envs: Retry connecting to the spine several times at startup

Fixed

Wheel balancer: Configure spine properly
Wheel odometry: Check that observer is configured properly

Removed

envs: Remove get_range from rewards as it is deprecated from Gymnasium

This discussion was created from the release v1.5.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upkie

v1.5.0: Refactor RL environments #143

{{title}}

Replies: 0 comments

Select a reply

Upkie

v1.5.0: Refactor RL environments #143

stephane-caron Sep 29, 2023 Maintainer

RL changes and migration notes

Migration notes

Changelog

Added

Changed

Fixed

Removed

Replies: 0 comments

stephane-caron
Sep 29, 2023
Maintainer