Here is my implementation of the model described in the paper Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning paper.
The corresponding feedback Nash equilibrium
Using the Off-Policy algorithm, I found the following control matrices
The probing noise will not affect the system and the Nash equilibrium solution learned without deviation with Off-Policy Algorithm.
| Convergence of the optimal control matrix (Off-Policy) | Convergence of the optimal control matrix (Off-Policy) |
|---|---|
With my code, you can:
- Model-Based by running
ModelBased.m - Off-Policy Algorithm by running
SolutionOffpolicyTracking.py
I will provide DockerFile soon.
- Matlab
- python 3.11
- numpy
- matplotlib