Sampling action when using deep policy #101

sunghwan87 · 2022-09-29T05:07:06Z

sunghwan87
Sep 29, 2022

Hi, thank you for this package I'm really enjoying learning about active inference.

I deeply appreciate the contributors to this package.

However, I have a question when I tried to implement the explore-exploit task in this article (Smith et al., A step-by-step tutorial on active inference and its application to empirical data, https://doi.org/10.1016/j.jmp.2021.102632) which is already implemented in MATLAB and "pymdp".

I tried to run a loop for active inference of deep policy (two time-steps) according to the "complete recipe for active inference" as written in the "pymdp" tutorial notebook, but I found that the "sample_action" method of the "Agent" class only sample action from the first timestep of policy (each policy has the shape of (2,2), the first dim is the number of timesteps and the second dim is the number of factors) using "control.sample_policy" function as below:

(line 674-675, control.py)

for factor_i in range(num_factors):
     selected_policy[factor_i] = policies[policy_idx][0, factor_i]

My setting of the agent class was:

timepoints = [0,1,2]
agent = Agent(
    A = A_gm,
    B = B,
    C = C,
    D = D_gm,
    E = E,
    pA = pA,
    pD = pD,
    policies = policies,
    policy_len = policies[0].shape[0],
    inference_horizon = len(timepoints),  
    inference_algo="MMP",
    sampling_mode="full",
    modalities_to_learn=[1],
    use_BMA = True,
    policy_sep_prior = False,
)

In my thought, to sample the action of the other timestep in each policy, line 675 would be better if changed like this:

selected_policy[factor_i] = policies[policy_idx][timestep, factor_i]

If I didn't understand this package well, then please let me know how to correct it.

Thank you!

Answered by conorheins

Oct 12, 2022

Hi @sunghwan87,

Thanks for your comment / question. This is a good and important point to make; I hope my explanation below is clarifying. Let me know if not.

So the reason the agent only samples its action from the first timestep of the currently evaluated policy posterior (which, as you note, may extend multiple timesteps into the future), is because the agent can only take one action per timestep. It can not "pre-determine" its future actions without first gathering observations.

The loop is action --> observation --> inference(of both state and policy) --> take another action. So we first must sample another observation and thus re-estimate our policy posterior, before we can take ano…

View full answer

conorheins · 2022-10-12T12:43:48Z

conorheins
Oct 12, 2022
Maintainer

Hi @sunghwan87,

Thanks for your comment / question. This is a good and important point to make; I hope my explanation below is clarifying. Let me know if not.

So the reason the agent only samples its action from the first timestep of the currently evaluated policy posterior (which, as you note, may extend multiple timesteps into the future), is because the agent can only take one action per timestep. It can not "pre-determine" its future actions without first gathering observations.

The loop is action --> observation --> inference(of both state and policy) --> take another action. So we first must sample another observation and thus re-estimate our policy posterior, before we can take another action.

So your "forward horizon" of policy estimation will be constantly re-relativizing itself to the current timestep, in an online-fashion as you gather new observations, one timestep at a time. Does that make sense?

In other words, your thought to " to sample the action of the other timestep in each policy" doesn't really make sense, because we can't "sample an action in the future", until we've reached the future (and the future is now the present)

0 replies

conorheins · 2022-10-26T12:51:53Z

conorheins
Oct 26, 2022
Maintainer

Hi @sunghwan87, going to move this into a discussion and close

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling action when using deep policy #101

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Sampling action when using deep policy #101

sunghwan87 Sep 29, 2022

Replies: 2 comments

conorheins Oct 12, 2022 Maintainer

conorheins Oct 26, 2022 Maintainer

sunghwan87
Sep 29, 2022

conorheins
Oct 12, 2022
Maintainer

conorheins
Oct 26, 2022
Maintainer