Skip to content

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

andreacate
Copy link
Contributor

@andreacate andreacate commented Mar 31, 2025

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

  • Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

  • Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
  • Add support for measurement error.

@zaxtax
Copy link
Contributor

zaxtax commented Apr 1, 2025

Looks interesting! Just say when you think it's ready for review

@fonnesbeck
Copy link
Member

cc @jessegrabowski

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@andreacate
Copy link
Contributor Author

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

verbose: bool, default True
If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports.

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to have to add all the math equations and whatnot here eventually. No rush, but I want to make sure it's on your TODO list. Check the VARMAX docstring for what I have in mind

# Factor states
for i in range(self.k_factors):
for lag in range(self.factor_order):
names.append(f"factor_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}

Not married to it, but consider it for consistency's sake.

if self.error_order > 0:
for i in range(self.k_endog):
for lag in range(self.error_order):
names.append(f"error_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above


# If error_order > 0
if self.error_order > 0:
coords["error_ar_param"] = list(range(1, self.error_order + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coords["error_ar_param"] = list(range(1, self.error_order + 1))
coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))

It's weird to have a global everywhere except here


self.ssm["initial_state_cov", :, :] = P0

# TODO vectorize the design matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to have to double-check all of these matrix constructions if you re-ordered the states.

@andreacate andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44
@jessegrabowski
Copy link
Member

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

@andreacate andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

:math:`\{y_t\}_{t=0}^T`, with :math:`y_t = \begin{bmatrix} y_{1,t} & y_{2,t} & \cdots & y_{k_endog,t} \end{bmatrix}^T`,
the DFM assumes that each series is a linear combination of a few latent factors and optional autoregressive errors.

Specifically, denoting the number of dynamic factors as :math:`k_factors`, the order of the latent factor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use \text and escape the underscore for k_factors (I guess you wanted k = k_factors?)

Comment on lines +82 to +91
y_t & = \Lambda f_t + B x_t + u_t \\
f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + \eta_t \\
u_t & = C_1 u_{t-1} + \dots + C_q u_{t-q} + \varepsilon_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u_t should be measurement error, and the last equation should have eta_t on the LHS

Comment on lines +95 to +104
Internally, this model is represented in state-space form by stacking all current and lagged latent factors and,
if present, autoregressive observation errors into a single state vector. The full state vector has dimension
:math:`k_factors \cdot factor_order + k_endog \cdot error_order`, where :math:`k_endog` is the number of observed time series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show the actual transition equation that is used in block form, using the vectors/matrices that you defined above.

covariance matrix) is equal to the number of latent factors plus the number of observed series if AR errors are present.

As in other high-dimensional models, identification can be an issue, especially when many observed series load on few
factors. Careful prior specification is typically required for good estimation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put more emphasis on this note about priors. Maybe use a ..warning: directive? See here. Also expand the comment, and talk about how the models are only identified up to a sign flip, etc.

error_order=1,
error_var=False,
error_cov_type="diagonal",
filter_type="standard",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filter_type="standard",

Elide the filter_type thing, it's not well supported yet

if k_endog is None:
k_endog = len(endog_names)
if endog_names is None:
endog_names = [f"endog_{i+1}" for i in range(k_endog)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endog_names = [f"endog_{i+1}" for i in range(k_endog)]
endog_names = [f"endog_{i}" for i in range(k_endog)]

Tiny nitpick: I prefer to lean into the python zero indexing, please reject if you disagree.

# If factor_order is 0, we treat the factor as static (no dynamics),
# but it is still included in the state vector with one state per factor.
# Factor_ar paramter will not exist in this case.
k_factor_states = sum(max(order, 1) for order in self.factor_order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If factor_order = 1, is there 1 state or two states contributed?

coords[FACTOR_DIM] = [f"factor_{i+1}" for i in range(self.k_factors)]

# AR parameter dimensions - add if needed
if self._max_order > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're allowing different AR order per factor, we might have to have one coordinate per factor. Otherwise, every factor AR parameter will have the maximum number of labels, which won't be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feature might be more trouble than it's worth, thinking about this. Maybe I'm missing something?

"factor_loadings", shape=(self.k_endog, self.k_factors), dtype=floatX
)

self.ssm["design", :, :] = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All statespace matrices are initialized to zero matrices when you call the super constructor in the __init__ method. No need for this

Comment on lines +436 to +439
p = ar_coeffs.shape[0]
top_row = pt.reshape(ar_coeffs, (1, p))
below = pt.eye(p - 1, p, k=0)
return pt.concatenate([top_row, below], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get a matrix like this with a one-liner using .set:

p = ar_coeffs.shape[0]
return pt.eye(p, k=-1)[0].set(ar_coeffs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax is a bit weird, but .set returns the original tensor (the eye in this case) wrapped in a SetSubtensor operator, putting the provided values in the requested places. So unlike in numpy, you don't get back just the rows you sliced into -- you get back the whole thing, with the assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants