v1.5.0: Refactor RL environments #143
stephane-caron
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This release starts rolling out changes to RL environments, along with quality-of-life improvements to the startup and build processes. One of them: agents can now retry connecting to the spine several times at startup, getting us rid of clunky timeouts 😉
RL changes and migration notes
RL environments now work with$R(o, a)$ reward functions. The refactoring adds an intermediate abstract class for environments that control Upkie in the wheeled inverted pendulum, and additional acceleration limits to the
UpkieGroundVelocity
environment (formerlyUpkieWheelsEnv
).The refactoring also introduces a
regulate_frequency
boolean argument: the proper way not to regulate frequency is nowregulate_frequency=False
rather thanfrequency=None
.Enjoy these changes, and chime in in the Discussions if you have feedback 😃
Migration notes
Changelog
Added
env.rate
for logging purposesInitRandomization
dataclass to describe initial state randomizationUpkieGroundVelocity
can include a velocity low-pass filterUpkieGroundVelocity
can limit ground acceleration as wellUpkieGroundVelocity
low-pass filter can also be randomizedChanged
UpkieServosEnv
toUpkieServos
UpkieWheelsEnv
toUpkieGroundVelocity
regulate_frequency
env kwarg instead offrequency=None
hostname
ROBOT
environment variable toUPKIE_NAME
/tmp/ppo_balancer
effective_time_horizon
todiscounted_horizon_duration
Fixed
Removed
get_range
from rewards as it is deprecated from GymnasiumThis discussion was created from the release v1.5.0.
Beta Was this translation helpful? Give feedback.
All reactions