Here is my implementation of the model described in the paper Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems paper.
The algorithm makes the Learner achieve the same control matrix as the Expert, while The state-reward weight converges to a different value than the Expert.
The Expert's control matrix is as follows
I show the results obtained from my experiments.
| Convergence of the proposed algorithm |
|---|
![]() |
| Output result |
|---|
![]() |
I will provide DockerFile soon.
- Julia v1.10.3
- LinearAlgebra
- Plots
- Kronecker

