Skip to content

Unified per-instrument stream registry + LogContextBinding#936

Draft
SimonHeybrock wants to merge 7 commits into
mainfrom
stream-registry
Draft

Unified per-instrument stream registry + LogContextBinding#936
SimonHeybrock wants to merge 7 commits into
mainfrom
stream-registry

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

Replaces the hand-maintained f144_log_streams dict + f144_attribute_registry per-instrument structures with a single Instrument.streams: dict[str, Stream], sourced from coda HDF5 files via a checked-in streams_parsed.py + build_streams(...) composition. Adds LogContextBinding for routing f144 streams into Sciline workflow keys. Full design in docs/developer/plans/unified-stream-registry.md.

What's in this PR

  • Single source of truth. Stream / F144Stream records replace the parallel structures (specs.py dict, Instrument.f144_attribute_registry, StreamLUT[logs]). Topic/source/units/path live on one record; consumers derive views.
  • LogContextBinding. Declares (stream_name, workflow_key, dependent_sources); replaces hand-coded context_keys={'detector_rotation': InstrumentAngle[SampleRun], ...} in factories. Bifrost migrated as the demonstration consumer. LOKI dynamic-transforms and chopper-workflow can adopt the same binding list independently.
  • Codegen. python -m ess.livedata.nexus_helpers <coda.hdf> --generate --output … emits an importable streams_parsed.py (SPDX header + from ess.livedata.config import F144Stream + sorted list[F144Stream]). build_streams(parsed, overrides=…, synthetics=…) composes the final registry; overrides accept either stream_name or nexus_path keys.
  • All instruments rolled out (except dummy): bifrost (52 streams), estia (38), loki (78), nmx (64), odin (156), tbl (68). Each sourced from its coda_<inst>_*.hdf ground-truth file. Overrides preserve the few names referenced elsewhere (Bifrost factory bindings, aux_sources, source_metadata).
  • suggest_internal_name improvements plus generator-side collision detection. Filters generic NeXus containers (entry/instrument/sample/sample_environment/transformations); joins parent+leaf so phase / translation1 carry parent context. No collisions across 776 entries.
  • Drift test parameterised over instruments; regenerates from local coda file when present, skips otherwise (coda files are not in pooch yet).

Why now

Two motivations from the design doc: dynamic-transforms and chopper-workflow both need a stream→Sciline-key routing layer. Without unification each added structure (attribute registry, log_context_bindings shim, dynamic_transforms dict) drifts from the others. This PR collapses them so the follow-ups (LOKI dynamic-transforms migration; chopper-workflow adopting bindings) become focused diffs.

The expansion of registered streams (~25 → ~456 auto-registered timeseries workflows across instruments) is intended: manual setup was legacy and incomplete; the dashboard now surfaces the full readback set the file writer actually publishes.

Caveats

  • Coda files (`coda_*.hdf` in repo root) are local to the devcontainer; the drift test skips when they're absent. Production geometry files in pooch are older than the coda files and will be refreshed separately.
  • Estia coda file uses the same PV (`ESTIA-DtRot:MC-RotZ01:Mtr.RBV`) under two paths; only the readback path is targeted by the override.

Test plan

  • Smoke-test each backend service with one instrument (`fake_logdata` + `timeseries` + `dashboard`) and confirm new timeseries workflows appear and accumulate
  • Verify Bifrost cut workflow still binds rotation aux inputs correctly (regression for the binding migration)
  • Verify LOKI dynamic-transforms still wires `detector_carriage` to the depends_on chain

Adds `Stream` / `F144Stream` config records and an `Instrument.streams:
dict[str, Stream]` field that replaces the parallel `f144_log_streams`
dict + `f144_attribute_registry` field maintained per instrument. The
StreamLUT for f144 is now derived from `instrument.streams`; the
attribute lookup in detector/reduction/timeseries handlers reads
`F144Stream.units` directly. Synthetic `chopper_cascade` handling moves
to a `_SYNTHETIC_LOG_SOURCES` whitelist (no per-stream attrs needed
beyond the empty-units default).

Implements steps 1-5 of `docs/developer/plans/unified-stream-registry.md`.
The codegen helper (`nexus_helpers.generate_f144_log_streams_code`) now
emits `F144Stream` literals so generated output drops in directly.
Adds `LogContextBinding` (stream_name, workflow_key, dependent_sources)
and `Instrument.log_context_bindings: list[LogContextBinding]`. Bindings
are registered via `Instrument.add_log_context_binding(...)` so that
heavy Sciline-key imports (e.g. `InstrumentAngle[SampleRun]` from
`ess.spectroscopy`) stay out of per-instrument specs.py and are deferred
to `setup_factories`. `Instrument.__post_init__` validates that any
construction-time bindings reference declared streams;
`_validate_binding_dependent_sources` is called at the end of
`load_factories` to ensure each binding's `dependent_sources` matches
some registered spec's source_names.

Bifrost migrated as the demonstration consumer: the hand-coded
`context_keys={'detector_rotation': InstrumentAngle[SampleRun], ...}` in
`_make_cut_stream_processor` is now `instrument.get_context_keys(
'unified_detector')`, derived from bindings registered in
`setup_factories`.

LOKI's dynamic-transforms and any future chopper-workflow consumer can
now adopt the same binding list independently, without serialised
rollout.

Implements steps 6-7 of `docs/developer/plans/unified-stream-registry.md`
for the typed Sciline-key consumer path. The dynamic-transforms path
(LOKI's `TransformValueStream` machinery) is unchanged and will be a
focused follow-up.
Implements step 9 of `docs/developer/plans/unified-stream-registry.md`
end-to-end as a proof point. Three pieces:

1. `build_streams(parsed, overrides, synthetics)` composes a final
   `dict[str, Stream]` from a parsed list, hand-edited overrides
   (keyed by either `stream_name` or `nexus_path`, applied via
   `dataclasses.replace`), and additional synthetic streams. Detects
   stream_name collisions in the parsed list and across synthetics.

2. `nexus_helpers.generate_streams_parsed_module` emits a complete
   importable Python module (SPDX header, docstring with source
   filename, `F144Stream` import, sorted list literal). The CLI now
   supports `--output PATH` to write directly. Multi-topic by default;
   `--topic` and `--exclude` still filter.

3. LOKI migrated as the demonstration consumer. `loki/streams_parsed.py`
   is checked-in auto-generated content; `loki/specs.py` imports
   `PARSED_STREAMS` and runs it through `build_streams`. The resulting
   `Instrument.streams` is functionally equivalent to before (the only
   addition is the parsed `nexus_path` field, which carries no behavioural
   change today but is the wire that LOKI's dynamic-transforms follow-up
   will tap into).

A drift test (`tests/config/streams_parsed_test.py`) regenerates the
LOKI module from the pinned geometry file and fails on diff, with a
shell command in the failure message to refresh.

Other instruments (bifrost, estia, dummy) keep hand-coded entries —
their geometry files diverge from current spec naming in ways that need
careful overrides, and the mechanism is now proven so those migrations
can land on their own timelines.
Generates and checks in ``streams_parsed.py`` for bifrost, estia, loki,
nmx, odin, and tbl, sourced from the corresponding ``coda_<inst>_*.hdf``
ground-truth files. Each instrument's ``specs.py`` now composes its
registry via ``build_streams(PARSED_STREAMS, ...)``. Where stream names
are referenced by other code (Bifrost factory bindings and aux_sources;
Estia source_metadata), the parsed entries are renamed via overrides so
those references remain intact:

* Bifrost: ``detector_rotation`` (from ``detector_tank_angle_r0/value``)
  and ``sample_rotation`` (from ``114_sample_stack/rotation_stage/value``)
* Estia: ``detector_rotation`` (from
  ``detector_arm/detector_rotation/value``)

LOKI's ``streams_parsed.py`` is regenerated from the coda file too,
since its 140-entry coda inventory is much fuller than the geometry-file
fallback we used in the prior commit.

Mechanism improvements landed in the same change:

* ``suggest_internal_name`` now filters generic NeXus container groups
  (``entry``/``instrument``/``sample``/``sample_environment``/
  ``transformations``) and joins parent+leaf so colliding leaves like
  ``phase`` or ``translation1`` carry parent context (e.g.
  ``005_PulseShapingChopper_phase``, ``wfm1_translation1``).
* ``generate_streams_parsed_module`` raises with a per-conflict listing
  if any two distinct NXlog parents end up with the same suggested name
  — the user resolves via build_streams overrides.

Drift test (``tests/config/streams_parsed_test.py``) is parameterised
over instruments and regenerates from the local coda file when present
(devcontainer); skipped otherwise. The LOKI-specific check moves to the
shared mechanism. Stream/log mappings in ``streams.py`` for ``odin``,
``tbl``, and ``nmx`` are extended to expose the new f144 streams.

Net effect: ``Instrument.streams`` now reflects what the file writer
actually writes, with stream counts (bifrost 52, estia 38, loki 78,
nmx 64, odin 156, tbl 68) replacing the previous hand-coded subset.
Auto-registered timeseries workflows expand accordingly, making the
dashboard surface the full set of available readbacks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant