Skip to content

Interface for exploration policy #10

@MaximeBouton

Description

@MaximeBouton

What would be a good interface for specifying the exploration policy?

It is implemented differently here and in DeepQLearning.jl.

  • What is implemented here: Just allows a limited set of possible policy e.g. EpsGreedyPolicy and uses the internal of that policy to access the Q value. I think it is pretty bad: EpsGreedyPolicy should be agnostic to the type of policy for the greedy part (right now it assumes a tabular policy I think), if we improve EpsGreedyPolicy then the code here will break.
  • In DeepQLearning.jl, the user must pass in a function f and f(policy, env, obs, global_step, rng) will be called to return the action. I took inspiration from MCTS.jl for this. However it is not super convenient to define decaying epsilon schedule with this approach.
  • A suggestion is to use a function action(::ExplorationPolicy, current_policy, env, obs, rng). Dispatching on the type of ExplorationPolicy and having users implement their own type seems more julian than passing a function. The method action is not super consistent with the rest of the POMDPs.jl interface since it takes the current policy and the environment as input.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions