Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

andreacate · 2025-03-31T18:47:53Z

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
Add support for measurement error.

zaxtax · 2025-04-01T23:05:56Z

Looks interesting! Just say when you think it's ready for review

fonnesbeck · 2025-04-05T15:37:57Z

cc @jessegrabowski

review-notebook-app · 2025-04-07T15:01:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

andreacate · 2025-04-07T15:06:44Z

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-07-13T05:37:14Z

pymc_extras/statespace/models/DFM.py

+    verbose: bool, default True
+        If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports.
+
+    Notes


We're going to have to add all the math equations and whatnot here eventually. No rush, but I want to make sure it's on your TODO list. Check the VARMAX docstring for what I have in mind

jessegrabowski · 2025-07-13T05:44:45Z

pymc_extras/statespace/models/DFM.py

+        # Factor states
+        for i in range(self.k_factors):
+            for lag in range(self.factor_order):
+                names.append(f"factor_{i+1}_lag{lag}")


nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}

Not married to it, but consider it for consistency's sake.

jessegrabowski · 2025-07-13T05:45:21Z

pymc_extras/statespace/models/DFM.py

+        if self.error_order > 0:
+            for i in range(self.k_endog):
+                for lag in range(self.error_order):
+                    names.append(f"error_{i+1}_lag{lag}")


jessegrabowski · 2025-07-13T05:46:01Z

pymc_extras/statespace/models/DFM.py

+
+        # If error_order > 0
+        if self.error_order > 0:
+            coords["error_ar_param"] = list(range(1, self.error_order + 1))


Suggested change

coords["error_ar_param"] = list(range(1, self.error_order + 1))

coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))

It's weird to have a global everywhere except here

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-07-13T06:35:14Z

pymc_extras/statespace/models/DFM.py

+
+        self.ssm["initial_state_cov", :, :] = P0
+
+        # TODO vectorize the design matrix


You're going to have to double-check all of these matrix constructions if you re-ordered the states.

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-07-25T11:37:07Z

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

Still to do: 1) vectorization/block matrices 2) measurament errors

jessegrabowski

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

jessegrabowski · 2025-07-17T11:53:33Z

pymc_extras/statespace/models/DFM.py

+    :math:`\{y_t\}_{t=0}^T`, with :math:`y_t = \begin{bmatrix} y_{1,t} & y_{2,t} & \cdots & y_{k_endog,t} \end{bmatrix}^T`,
+    the DFM assumes that each series is a linear combination of a few latent factors and optional autoregressive errors.
+
+    Specifically, denoting the number of dynamic factors as :math:`k_factors`, the order of the latent factor


Use \text and escape the underscore for k_factors (I guess you wanted k = k_factors?)

jessegrabowski · 2025-07-17T11:56:14Z

pymc_extras/statespace/models/DFM.py

+        y_t & = \Lambda f_t + B x_t + u_t \\
+        f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + \eta_t \\
+        u_t & = C_1 u_{t-1} + \dots + C_q u_{t-q} + \varepsilon_t


u_t should be measurement error, and the last equation should have eta_t on the LHS

jessegrabowski · 2025-07-17T12:09:37Z

pymc_extras/statespace/models/DFM.py

+    Internally, this model is represented in state-space form by stacking all current and lagged latent factors and,
+    if present, autoregressive observation errors into a single state vector. The full state vector has dimension
+    :math:`k_factors \cdot factor_order + k_endog \cdot error_order`, where :math:`k_endog` is the number of observed time series.


Show the actual transition equation that is used in block form, using the vectors/matrices that you defined above.

jessegrabowski · 2025-07-17T12:13:46Z

pymc_extras/statespace/models/DFM.py

+    covariance matrix) is equal to the number of latent factors plus the number of observed series if AR errors are present.
+
+    As in other high-dimensional models, identification can be an issue, especially when many observed series load on few
+    factors. Careful prior specification is typically required for good estimation.


I'd put more emphasis on this note about priors. Maybe use a ..warning: directive? See here. Also expand the comment, and talk about how the models are only identified up to a sign flip, etc.

jessegrabowski · 2025-07-17T12:16:00Z

pymc_extras/statespace/models/DFM.py

+                error_order=1,
+                error_var=False,
+                error_cov_type="diagonal",
+                filter_type="standard",


Suggested change

filter_type="standard",

Elide the filter_type thing, it's not well supported yet

jessegrabowski · 2025-07-27T04:48:03Z

pymc_extras/statespace/models/DFM.py

+        if k_endog is None:
+            k_endog = len(endog_names)
+        if endog_names is None:
+            endog_names = [f"endog_{i+1}" for i in range(k_endog)]


Suggested change

endog_names = [f"endog_{i+1}" for i in range(k_endog)]

endog_names = [f"endog_{i}" for i in range(k_endog)]

Tiny nitpick: I prefer to lean into the python zero indexing, please reject if you disagree.

jessegrabowski · 2025-07-27T04:49:50Z

pymc_extras/statespace/models/DFM.py

+        # If factor_order is 0, we treat the factor as static (no dynamics),
+        # but it is still included in the state vector with one state per factor.
+        # Factor_ar paramter will not exist in this case.
+        k_factor_states = sum(max(order, 1) for order in self.factor_order)


If factor_order = 1, is there 1 state or two states contributed?

jessegrabowski · 2025-07-27T04:54:45Z

pymc_extras/statespace/models/DFM.py

+        coords[FACTOR_DIM] = [f"factor_{i+1}" for i in range(self.k_factors)]
+
+        # AR parameter dimensions - add if needed
+        if self._max_order > 0:


Since we're allowing different AR order per factor, we might have to have one coordinate per factor. Otherwise, every factor AR parameter will have the maximum number of labels, which won't be correct.

That feature might be more trouble than it's worth, thinking about this. Maybe I'm missing something?

jessegrabowski · 2025-07-27T04:59:54Z

pymc_extras/statespace/models/DFM.py

+            "factor_loadings", shape=(self.k_endog, self.k_factors), dtype=floatX
+        )
+
+        self.ssm["design", :, :] = 0.0


All statespace matrices are initialized to zero matrices when you call the super constructor in the __init__ method. No need for this

jessegrabowski · 2025-07-27T05:03:07Z

pymc_extras/statespace/models/DFM.py

+            p = ar_coeffs.shape[0]
+            top_row = pt.reshape(ar_coeffs, (1, p))
+            below = pt.eye(p - 1, p, k=0)
+            return pt.concatenate([top_row, below], axis=0)


You can get a matrix like this with a one-liner using .set:

p = ar_coeffs.shape[0] return pt.eye(p, k=-1)[0].set(ar_coeffs)

The syntax is a bit weird, but .set returns the original tensor (the eye in this case) wrapped in a SetSubtensor operator, putting the provided values in the requested places. So unlike in numpy, you don't get back just the rows you sliced into -- you get back the whole thing, with the assignment.

jessegrabowski requested changes Jul 13, 2025

View reviewed changes

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44

andreacate and others added 8 commits July 25, 2025 15:18

Added new file DFM.py for GSOC 2025 Dynamical Factor Models

569d70b

Add initial notebook on custom DFM implementation

0702430

Update of DFM draft implementation

50f050b

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

Update of the implementation

828762b

Still to do: 1) vectorization/block matrices 2) measurament errors

Update DFM.py

a45acbc

DFM model and test added after deleting wrong merge operation

3221188

Refactor DFM tests and update constants after the rebasing procedure

2a843c8

Refactor of test_DFM.py

bc3fcf2

andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51

jessegrabowski requested changes Jul 27, 2025

View reviewed changes

	coords["error_ar_param"] = list(range(1, self.error_order + 1))
	coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))


		self.ssm["initial_state_cov", :, :] = P0

		# TODO vectorize the design matrix

	endog_names = [f"endog_{i+1}" for i in range(k_endog)]
	endog_names = [f"endog_{i}" for i in range(k_endog)]

Uh oh!

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Are you sure you want to change the base?

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Uh oh!

Conversation

andreacate commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dynamical Factor Models (DFM) Implementation

Overview

Current Status

Next Steps

Uh oh!

zaxtax commented Apr 1, 2025

Uh oh!

fonnesbeck commented Apr 5, 2025

Uh oh!

review-notebook-app bot commented Apr 7, 2025

Uh oh!

andreacate commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Jul 25, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreacate commented Mar 31, 2025 •

edited

Loading