Interface for exploration policy

What would be a good interface for specifying the exploration policy? 

It is implemented differently here and in `DeepQLearning.jl`. 

- What is implemented here: Just allows a limited set of possible policy e.g. `EpsGreedyPolicy` and uses the internal of that policy to access the Q value. I think it is pretty bad: `EpsGreedyPolicy`  should be agnostic to the type of policy for the greedy part (right now it assumes a tabular policy I think), if we improve `EpsGreedyPolicy` then the code here will break. 
- In `DeepQLearning.jl`, the user must pass in a function `f` and `f(policy, env, obs, global_step, rng)` will be called to return the action. I took inspiration from MCTS.jl for this. However it is not super convenient to define decaying epsilon schedule with this approach. 
- A suggestion is to use a function `action(::ExplorationPolicy, current_policy, env, obs, rng)`. Dispatching on the type of `ExplorationPolicy` and having users implement their own type seems more julian than passing a function. The method `action` is not super consistent with the rest of the POMDPs.jl interface since it takes the current policy and the environment as input. 

Any thoughts?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface for exploration policy #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Interface for exploration policy #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions