Skip to content

Named StateSpaceModel v2 – rebase and updates after #607#654

Open
opherdonchin wants to merge 3 commits intopymc-devs:mainfrom
opherdonchin:named-statespace-v2
Open

Named StateSpaceModel v2 – rebase and updates after #607#654
opherdonchin wants to merge 3 commits intopymc-devs:mainfrom
opherdonchin:named-statespace-v2

Conversation

@opherdonchin
Copy link
Copy Markdown

Summary

This PR reintroduces the Named StateSpaceModel changes after rebasing onto main following the merge of #607.

The previous version was submitted as #611. This update incorporates the changes introduced in #607 and resolves the resulting conflicts while preserving the intended named state space behavior.

No new conceptual changes are introduced beyond what was discussed in #611.

Changes

Tests

Environment:

  • Linux (Fedora) and Windows 11
  • Python 3.14.2 (Linus) and 3.12 (Windows)
  • Commit: b173e6e

Results:

  • tests/statespace/core

    • 182 passed
    • 2 skipped (JAX tests require nutpie)
    • 0 failed
  • tests/statespace/filters

    • 59 passed
    • 1 failed (test_kalman_filter_jax[cholesky])

The single failing test occurs in the JAX Cholesky filter path and appears unrelated to the named model changes. All standard (non-JAX) filter paths and core state space tests pass.

Notes

@jessegrabowski
Copy link
Copy Markdown
Member

It looks like you checked in several files related to your local dev environment, could you clean it up?

@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from cbea84d to 1bf3670 Compare February 20, 2026 20:09
@opherdonchin
Copy link
Copy Markdown
Author

Sorry about that. The current PR includes only the changed files and a test:

statespace.py
data_tools.py
test_namespace.py

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 21, 2026

Codecov Report

❌ Patch coverage is 75.75758% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.24%. Comparing base (0bcccd5) to head (1bf3670).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pymc_extras/statespace/core/statespace.py 72.41% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #654       +/-   ##
===========================================
+ Coverage   42.43%   69.24%   +26.81%     
===========================================
  Files          63       73       +10     
  Lines        7207     7738      +531     
===========================================
+ Hits         3058     5358     +2300     
+ Misses       4149     2380     -1769     
Files with missing lines Coverage Δ
pymc_extras/statespace/utils/data_tools.py 69.56% <100.00%> (+53.04%) ⬆️
pymc_extras/statespace/core/statespace.py 78.44% <72.41%> (+63.87%) ⬆️

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is looking really clean. Just a few requests.

Also be sure to install pre-commit (pip install pre-commit; pre-commit install; pre-commit run --all) so your PR gets formatted and linted correctly.

)

if name in self._tensor_variable_info:
gname = self.graph_name(name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. gname is too cryptic. Another option would be to change name to base_name and make name the prefixed version, since this will now be the "canonical" name that users see.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the new scheme (base_name for the unprefixed name and name for the name in the graph) it makes most sense to just call this variable name.

raise ValueError(
f"{name} is not a model parameter. All placeholder variables should correspond to model "
f"parameters."
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
f"{name} is not a model data variable. All placeholder variables should correspond to model "

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

f"{name} is not a model parameter. All placeholder variables should correspond to model "
f"parameters."
f"{name} is not a model data-variable. All placeholder variables should correspond to model "
f"data-variables."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"data-variables."
f"data variables."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

n_obs=self.ssm.k_endog,
obs_coords=obs_coords,
register_data=True,
data_name=self.graph_name("data"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's let users choose the data name in the PyMCStateSpace constructor in this PR as well, instead of hard coding it to "data". The default can still be data, but then this line becomes data_name = self.graph_name(self.data_name)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from pymc_extras.statespace.core.statespace import PyMCStateSpace


def test_two_statespace_models_can_coexist_with_names(monkeypatch):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this test in the test_model file, it doesn't need a new file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@opherdonchin
Copy link
Copy Markdown
Author

The new commit includes all changes suggested and the pre-commit processing.

@jessegrabowski
Copy link
Copy Markdown
Member

Really like where this landed, it's looking very clean.

Could you double-check that we don't need to make any changes to Component or StructuralTimeSeries? Those are in the models/structural module, and are a sort-of parallel StatespaceModel that needs to be cleaned up some day.

@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch 2 times, most recently from ec6a4bd to 27a28ea Compare February 25, 2026 12:31
@opherdonchin
Copy link
Copy Markdown
Author

I'm working on the fix for structural models and in working on the __init__ for StructuralTimeSeries, I ran into a problem.

Components create placeholders during make_symbolic_graph(), before we know the final StructuralTimeSeries(name=...). If a component is used in multiple StructuralTimeSeries models with different names, it could lead to silent aliasing which would be hard to track and fix.

I see a few options and I need help choosing:

  1. Defer placeholder creation until model build. That is, components only provide a build() function which will create the placeholders and then names are applied when the model is actually constructed by StructuralTimeSeries. This is probably the cleanest solution, but it might be a significant refactor.

  2. Allow component names independent of model name. On model build test that the component objects are not being used twice in the same model (this silently allows components to be used twice in different PyMC models, but that should be less dangerous). Extend name functionality separately to components and StructuralTimeSeries objects. This is consistent with current PyMC practice of forcing user to be responsible to avoid renaming. It would be possible today (but a little weird) to use the same PyTensor object in different PyMC graphs, so that is also consistent with current practice.

  3. Force canonical naming in structural models. Do not allow multiple structural models in the same PyMC graph. This is what we had before for all state space models. We could create an issue for this and fix it later.

Please let me know which direction you'd like to take it.

@jessegrabowski
Copy link
Copy Markdown
Member

We should be able to use graph_replace to swap out all the placeholder variables with new placeholders that have the prefixed name. So the components themselves don't need to know about this naming convention, the __init__ method in StructuralStateSpace can just handle it.

You can check PyMCStateSpace._insert_random_variables for a pattern on how to collect the dummies and do a replace, but it should look something like replacements = {var: var.type(name=self.prefix_name(var.name)) for var in explicit_graph_inputs(statespace_matrices)}

@AlexAndorra
Copy link
Copy Markdown
Contributor

Any help from me needed here @opherdonchin @jessegrabowski ?

…nique graph naming

Create a test to ensure multiple state space models can coexist with distinct names.
StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create
fresh, prefixed placeholder variables in all SSM matrices when a model
name is provided. This replaces the previous metadata-only rename
approach, which left the actual graph placeholders unchanged and caused
silent aliasing when the same Component instance was reused across
multiple named StructuralTimeSeries models.

Key changes:
- Add _prefix_placeholder_variables() method that builds a replacement
  mapping from old to new (prefixed) placeholder variables, applies
  graph_replace across all SSM matrices, and rebuilds SymbolicVariableInfo
  and SymbolicDataInfo with aligned names.
- Add _validate_symbolic_info() diagnostic helper.
- Pass name=name through to PyMCStateSpace.__init__.
- Add 8 focused tests covering the aliasing bug, name alignment,
  graph correctness, and the validation helper.
@opherdonchin opherdonchin force-pushed the named-statespace-v2 branch from 27a28ea to e950db5 Compare April 3, 2026 22:56
@opherdonchin
Copy link
Copy Markdown
Author

@AlexAndorra: thanks for the nudge!

I think what I've done makes sense and follows @jessegrabowski's suggestion. The branch is now rebased onto current main (v0.10.0) with a clean 3-commit history.

What changed in the new commit:

StructuralTimeSeries.__init__ now uses pytensor.graph_replace to create fresh, prefixed placeholder variables in all SSM matrices when a model name is provided. A single _prefix_placeholder_variables() method collects all placeholders from tensor_variable_info and tensor_data_info, builds new variables via old_var.type(name=self.prefixed_name(old_var.name)), applies graph_replace to all 9 SSM matrices, and rebuilds the symbolic info objects with aligned names. Components remain unaware of the naming convention.

I added appropriate tests in TestGraphReplacePlaceholderNamespacing.

Let me know if this is a good fix and if anything else is needed before this is ready to merge?

Thanks!
Opher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants