Implement JSRL like training strategy #1

masus04 · 2022-08-20T13:03:14Z

This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.

In order to generate game_state(t)s, we intend to perform the following steps:

Choose a prior strategy that can be configured to play deterministic or non-deterministic
Play the non-deterministic version of prior strategy either against itself or a deterministic version of itself up to time t
Determine which player is favoured according to the deterministic prior strategy
Play the exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.

Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement JSRL like training strategy #1

Implement JSRL like training strategy #1

masus04 commented Aug 20, 2022

Implement JSRL like training strategy #1

Implement JSRL like training strategy #1

Comments

masus04 commented Aug 20, 2022