Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose NeXus transformation chains in workflow steps #114

Merged
merged 20 commits into from
Oct 8, 2024

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Oct 2, 2024

Part of #96.

This is not completing this task, but does most of the prep work. The idea is that the to_transformation could accept additional arguments (or be replaced by one that does), which could:

  1. Select a specific time interval, including averaging and threshold checking.
  2. Update values of one or more transformations in the chain. This is mainly relevant for live-data reduction.

Neither 1.) nor 2.) is implemented right now. The initial focus will be on 1.). I would mainly like input from the reviewer on whether this new structure is adequate for this. The actual implementation of these features could be performed in follow-up work.

Depends on a release of scipp/scippnexus#243 (that is why CI is failing for now, otherwise this PR is ready for review).

Note that I ran into major (code) overhead/duplication from the current duplication (or near-duplication) of different code for different component types. Previously we had failed to find a way to generalize this. The reason was that monitors required one extra type-parameter (MyMonitor[RunType, MonitorType] vs MyDetector[RunType], MySample[RunType], ...). I believe I have now solved this via the new ComponentType. We can thus use MyComponent[Component, RunType] almost throughout, resulting in a significant deduplication. The "trick" is that Component takes general component names except for the monitors, where there are multiple named ones.

return compute_component_position(loaded)
return loaded
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the heart of the change in this PR: We are no longer computing the position directly here. This is moved into an extra step.

@@ -95,9 +91,9 @@ class TransmissionRun(Generic[ScatteringRunType]):
"""Identifier for an arbitrary monitor"""
Monitor5 = NewType('Monitor5', int)
"""Identifier for an arbitrary monitor"""
Incident = NewType('Incident', int)
IncidentMonitor = NewType('IncidentMonitor', int)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the monitor suffix here since these now show up as MyComponent[IndicidentMonitor] and it may be confusing without.

@SimonHeybrock SimonHeybrock marked this pull request as ready for review October 3, 2024 13:49
@nvaytet nvaytet self-assigned this Oct 4, 2024
) -> NeXusSource[RunType]:
"""
Load a NeXus source group from a file.
def nx_class_for_monitor() -> NeXusClass[MonitorType]:
Copy link
Member

@nvaytet nvaytet Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly annoying that we need to explicitly have nx_class_for_monitor, nx_class_for_detector, nx_class_for_source, nx_class_for_sample... but not sure what can be done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could set it as workflow parameters. Alternative would be to duplicate downstream providers, but as those have functionality it would mean code duplication, i.e., break more easily by running out of sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as is for now.

) -> NeXusMonitor[RunType, MonitorType]:
"""
Load monitor from NeXus, but with event data replaced by placeholders.
When loading a detector or monitor, event data is replaced by placeholders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something, but is this for loading only detectors and monitors? I thought Component was also for sample and source?

Edit: I now see that the case for NXsample is overloaded by the load_nexus_sample function above. Are we missing a load_nexus_source function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also used for the source. But this paragraph basically just applies to the groups that have event data.

file_path_to_file_spec,
all_pulses,
component_spec_by_name,
unique_component_spec, # after component_spec_by_name, partially overrides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the comment mean that we are relying on the behaviour of the pipeline; that it applies the providers in the order that they are given. Can we assume that the behaviour will never change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing that in many places now, not just here: Insertion order does matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are explicit tests for this, e.g., here: https://github.com/scipp/sciline/blob/a90b73c50d4cef1fc8adb231dc2208cf99ef2f0a/tests/pipeline_test.py#L916. That is, this is intentional behavior and not likely to change. If it would, quite a few other workflows would also break, I think.

Comment on lines -91 to -100
nexus_detector,
offset=workflow.no_offset,
source_position=source_position,
sample_position=workflow.origin,
gravity=workflow.gravity_vector_neg_y(),
bank_sizes={},
)
assert_identical(
detector.drop_coords(('sample_position', 'source_position', 'gravity')),
nexus_detector['data'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants