POMDP Tutorial/Example? #388

apashea · 2024-12-13T16:41:01Z

apashea
Dec 13, 2024

My following email extracts are from recent correspondence with Dmitry Bagaev, who advised I post here:

"RxInfer.jl does not cease to amaze me with its flexibility and clarity in the graphs. I'm still learning the essentials (@Constraints, @initialization, @model and nesting models, etc.). An exemplar model in the discrete state space in Active Inference is the POMDP, such as the T-maze problem. So far, I have been trying to follow the two-part LAIF papers scripts to understand how to build a POMDP, but there is a lot of code/logic... This is fair, as there are various scripts for different kinds of free energy computation, etc., for research papers rather than a tutorial thus necessitating a mini library so to speak. However, it would be great if you all could make a step-by-step tutorial for this problem which is as simple as possible, such as a POMDP which simply takes one observation (or multiple) at each timestep and infers states and does policy inference to commit an action which then impacts the environment, maybe building up to implementing learning the likelihood and/or transition model. In the LAIF script, everything is packed in .jl files elsewhere; the goal observation logic seems separate and a bit confusing (I suppose this is what allows generalized FE computation); inference(), execute(), observe(), are confusing as they are defined earlier inside of various functions; the A/B/C/D construction is a bit confusing as it seems there is one big multivariate matrix as opposed to submatrices as is found in MATLAB or pymdp (i.e. A[1], A[2], etc., as opposed to the large all-encompassing A in the LAIF script), ... So more clarity on this in a step-by-step from scratch way would be great.

I would especially like to see this incorporated with the 'Streamline Inference' RxInfer user guide documentation logic where inference is done as the data comes in instead of being run over static datasets. More specifically, I personally would like to be able to have the model/agent run state and policy inference in a 'per timestep' loop, where an initial observation for the agent is generated by the environment and the subsequent observations are generated based on the agent's action. On the user side, I think this would be useful both for model testing (more transparent environment logic, we often get overwhelmed when we are defining environments / loops separately and lodging them in functions elsewhere) and for applications where data is not necessarily available yet (which is often the case when the environment's generated observations are dependent on the agent's actions, and in production settings). I understand the utility of doing inference on static datasets, especially when developing models for applications using real data for fitting etc., but it seems constricting for us in this case. We like being able to modify the action-perception loop as we go, and at least for relatively small discrete action/control spaces, it helps with code transparency to just write out the if-else logic to determine the environment. And then if we have built our model and things are going well, if we intend to move it to production, we can just copy-paste as needed and have it ready for streamline inference as real data comes in instead of having to unwrap/re-define things again.

Pseudo-code for jupyter notebook cells (sorry, it's a little multilingual / Pythonic)

{cell 1}
# define agent

{cell 2}
# initial step
obs = [initial observation(s) for agent]
# agent receives observation, does state inference
# agent infers policies, commits initial discrete action(s) -> agent_action

# begin action-perception loop 
# (when put into production, it would be simplified by just saying `obs = [next real data observation` and then agent infers states/policies and executes actions etc)
T = 100
for t in 1:T
         if agent_action = 0
                 obs = ...
         elif agent_action = 1
                 obs = ...
# agent receives observation, does state inference
# agent infers policies, commits initial discrete action(s) -> agent_action
# agent learns A (likelihood) / B (transitions conditioned on actions/policies) / D (priors over initial states) / E (priors over actions/policies); whereas C is related to goal prior, following fixed/homeostasis logic

{cell 3}
# visualizations of results, etc.

....

Like, I get the idea of trying to have a nice concise cell at the end where the inference is actually done, as I see in many RxInfer examples, but it's just sooo much to scroll through when we actually want to experiment with it beyond simply running the example as-is. Does this make sense? Again, don't want to overload you and if your PhD student is already deep into it then I don't want to distract from what you have. This is all kind of the ideal, though. The big thing for us at the Institute has been that, if we cannot step-by-step visibly see code we might want to modify to adapt it for our use-case or for experimentation, then suddenly we have to unwrap everything, go back into all the include(file.jl) files, and figure out what we are able to change without breaking everything.... I have attached a notebook, a somewhat messy "flattened" version (where all necessary code is included, avoiding all 'include(file.jl)` calls, aside from importing established packages like RxInfer, Distributions, etc.) of the script for the LAIF T-maze (the generalized free energy one) to try to illustrate what I mean... we're trying to parse this out but it's very overwhelming... I've had some contact with Thijs van de Laar about it who was very helpful by updating the scripts to be compatible with more recent RxInfer updates, and a couple Institute members have reached out as well.. "

We've also looked many times at the "How to train your hidden Markov model" example, which seems like a nice, clear tutorial for building HMMs... so another thought is to extend this tutorial or make a similar one which simply adds preferences and policies to make it a POMDP...

Link to my "flattened" version of the LAIF Part 2 Generalized Free Energy script: https://github.com/apashea/julia_actinf_pomdp/blob/main/scripts/T-maze_Generalized_flattened_llm_markdown.ipynb

I can further clarify this as needed. Dmitry has noted a RxInfer POMDP tutorial is in progress now, so perhaps waiting for that is the right move!

wouterwln · 2024-12-13T18:03:02Z

wouterwln
Dec 13, 2024
Maintainer

Hey @apashea , thanks for the feedback! You are right, we are currently working on generalizing inference for Categorical transitions/Contingency tables, which is why this isn't available in base RxInfer yet. I implemented the TransitionMixture node required for POMDPs in this PR if you are really really eager to get started, but me and @Raphael-Tresor are working on a node which would model $p(y | \mathbf{x})$ when $x$ is a collection of Categorical random variables (which for POMDPs would mean we could model $p(x_t | x_{t-1}, u_t)$, but also $p(x_t | x_{t-1}, z_t, u_t)$ or transitions with more Categorical interfaces). This means that we don't really want to push TransitionMixture or MatrixDirichletMixture out there yet, because it will probably get outphased pretty soon by a general Transition interface.

As for the tutorial, we will start working on a tutorial to do AIF in RxInfer when the general Transition node is out, and we're likely to make this a POMDP tutorial indeed. Thank you for the suggestions, and we hope to be able to get back to you soon!

1 reply

apashea Dec 13, 2024
Author

Hello Wouter, amazing -- thank you so much for your timely update!! I will (eagerly!) await the Transition interface, to avoid writing obsolete code. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReactiveBayes

POMDP Tutorial/Example? #388

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ReactiveBayes

POMDP Tutorial/Example? #388

apashea Dec 13, 2024

Replies: 1 comment · 1 reply

wouterwln Dec 13, 2024 Maintainer

apashea Dec 13, 2024 Author

apashea
Dec 13, 2024

Replies: 1 comment 1 reply

wouterwln
Dec 13, 2024
Maintainer

apashea Dec 13, 2024
Author