You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.
In order to generate game_state(t)s, we intend to perform the following steps:
Choose a prior strategy that can be configured to play deterministic or non-deterministic
Play the non-deterministic version of prior strategy either against itself or a deterministic version of itself up to time t
Determine which player is favoured according to the deterministic prior strategy
Play the exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.
This approach intends to make use of a
prior strategy
in order to unroll the game up to a certain point in timet
, then let theexploration strategy
being trained take over.t
is then gradually reduced as theexploration strategy
improves.In order to generate
game_state(t)
s, we intend to perform the following steps:prior strategy
that can be configured to play deterministic or non-deterministicprior strategy
either against itself or a deterministic version of itself up to timet
prior strategy
exploration strategy
against the deterministicprior strategy
, playing as the favoured player in order to guarantee it has a chance of winning.Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1
The text was updated successfully, but these errors were encountered: