-
-
Notifications
You must be signed in to change notification settings - Fork 69
Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Looks interesting! Just say when you think it's ready for review |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks for the feedback! I'm still exploring the best approach for implementing Dynamic Factor Models. |
verbose: bool, default True | ||
If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports. | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to have to add all the math equations and whatnot here eventually. No rush, but I want to make sure it's on your TODO list. Check the VARMAX docstring for what I have in mind
pymc_extras/statespace/models/DFM.py
Outdated
# Factor states | ||
for i in range(self.k_factors): | ||
for lag in range(self.factor_order): | ||
names.append(f"factor_{i+1}_lag{lag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}
Not married to it, but consider it for consistency's sake.
pymc_extras/statespace/models/DFM.py
Outdated
if self.error_order > 0: | ||
for i in range(self.k_endog): | ||
for lag in range(self.error_order): | ||
names.append(f"error_{i+1}_lag{lag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
pymc_extras/statespace/models/DFM.py
Outdated
|
||
# If error_order > 0 | ||
if self.error_order > 0: | ||
coords["error_ar_param"] = list(range(1, self.error_order + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coords["error_ar_param"] = list(range(1, self.error_order + 1)) | |
coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1)) |
It's weird to have a global everywhere except here
pymc_extras/statespace/models/DFM.py
Outdated
|
||
self.ssm["initial_state_cov", :, :] = P0 | ||
|
||
# TODO vectorize the design matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to have to double-check all of these matrix constructions if you re-ordered the states.
21560db
to
a459a1a
Compare
Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process |
In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)
Still to do: 1) vectorization/block matrices 2) measurament errors
1c04f65
to
bc3fcf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!
:math:`\{y_t\}_{t=0}^T`, with :math:`y_t = \begin{bmatrix} y_{1,t} & y_{2,t} & \cdots & y_{k_endog,t} \end{bmatrix}^T`, | ||
the DFM assumes that each series is a linear combination of a few latent factors and optional autoregressive errors. | ||
|
||
Specifically, denoting the number of dynamic factors as :math:`k_factors`, the order of the latent factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use \text
and escape the underscore for k_factors
(I guess you wanted k = k_factors?)
y_t & = \Lambda f_t + B x_t + u_t \\ | ||
f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + \eta_t \\ | ||
u_t & = C_1 u_{t-1} + \dots + C_q u_{t-q} + \varepsilon_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u_t
should be measurement error, and the last equation should have eta_t
on the LHS
Internally, this model is represented in state-space form by stacking all current and lagged latent factors and, | ||
if present, autoregressive observation errors into a single state vector. The full state vector has dimension | ||
:math:`k_factors \cdot factor_order + k_endog \cdot error_order`, where :math:`k_endog` is the number of observed time series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show the actual transition equation that is used in block form, using the vectors/matrices that you defined above.
covariance matrix) is equal to the number of latent factors plus the number of observed series if AR errors are present. | ||
|
||
As in other high-dimensional models, identification can be an issue, especially when many observed series load on few | ||
factors. Careful prior specification is typically required for good estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put more emphasis on this note about priors. Maybe use a ..warning:
directive? See here. Also expand the comment, and talk about how the models are only identified up to a sign flip, etc.
error_order=1, | ||
error_var=False, | ||
error_cov_type="diagonal", | ||
filter_type="standard", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter_type="standard", |
Elide the filter_type thing, it's not well supported yet
if k_endog is None: | ||
k_endog = len(endog_names) | ||
if endog_names is None: | ||
endog_names = [f"endog_{i+1}" for i in range(k_endog)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endog_names = [f"endog_{i+1}" for i in range(k_endog)] | |
endog_names = [f"endog_{i}" for i in range(k_endog)] |
Tiny nitpick: I prefer to lean into the python zero indexing, please reject if you disagree.
# If factor_order is 0, we treat the factor as static (no dynamics), | ||
# but it is still included in the state vector with one state per factor. | ||
# Factor_ar paramter will not exist in this case. | ||
k_factor_states = sum(max(order, 1) for order in self.factor_order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If factor_order = 1, is there 1 state or two states contributed?
coords[FACTOR_DIM] = [f"factor_{i+1}" for i in range(self.k_factors)] | ||
|
||
# AR parameter dimensions - add if needed | ||
if self._max_order > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're allowing different AR order per factor, we might have to have one coordinate per factor. Otherwise, every factor AR parameter will have the maximum number of labels, which won't be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feature might be more trouble than it's worth, thinking about this. Maybe I'm missing something?
"factor_loadings", shape=(self.k_endog, self.k_factors), dtype=floatX | ||
) | ||
|
||
self.ssm["design", :, :] = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All statespace matrices are initialized to zero matrices when you call the super constructor in the __init__
method. No need for this
p = ar_coeffs.shape[0] | ||
top_row = pt.reshape(ar_coeffs, (1, p)) | ||
below = pt.eye(p - 1, p, k=0) | ||
return pt.concatenate([top_row, below], axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get a matrix like this with a one-liner using .set
:
p = ar_coeffs.shape[0]
return pt.eye(p, k=-1)[0].set(ar_coeffs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax is a bit weird, but .set
returns the original tensor (the eye in this case) wrapped in a SetSubtensor
operator, putting the provided values in the requested places. So unlike in numpy, you don't get back just the rows you sliced into -- you get back the whole thing, with the assignment.
Dynamical Factor Models (DFM) Implementation
This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.
Overview
DFM.py
with initial functionalityCurrent Status
This implementation is a work in progress and I welcome any feedback
Next Steps