Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic NeXus loaders #10

Merged
merged 25 commits into from
Mar 11, 2024
Merged

Generic NeXus loaders #10

merged 25 commits into from
Mar 11, 2024

Conversation

jl-wynen
Copy link
Member

@jl-wynen jl-wynen commented Mar 6, 2024

Fixes #8

They are heavily based on the LOKI loaders in ESSsans and the test files used in that package.

file_path: Union[FilePath, NeXusFile, NeXusGroup],
*,
detector_name: DetectorName,
instrument_name: Optional[InstrumentName] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the odds that there is more than one instrument is close to zero, much more likely is more than one entry, shouldn't we have EntryName instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable

the sample NeXus group.
"""
with _open_nexus_file(file_path) as f:
entry = f['entry']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot assume this naming. Find unique NXentry, as suggested in the issue description.

instrument_name: Optional[InstrumentName] = None,
) -> sc.DataGroup:
with _open_nexus_file(file_path) as f:
entry = f['entry']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito



def extract_detector_data(
detector: RawDetector, detector_name: DetectorName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
detector: RawDetector, detector_name: DetectorName
detector: RawDetector

I don't think we should have DetectorName here. Instead return the event data, if found, otherwise the child data array (dense data). Reason: I don't think we should rely on the detector name being related to the name of the event-data subgroup. In the case of monitors there may be no subgroup at all. Also it is simpler to just look for the child data array(s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I identify the data? It's just one of many data arrays in the group.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can there only be one data array per detector / monitor?



def extract_monitor_data(
monitor: RawMonitor, monitor_name: MonitorName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
monitor: RawMonitor, monitor_name: MonitorName
monitor: RawMonitor

dito

Comment on lines 318 to 319
def _extract_events_or_histogram(dg: sc.DataGroup, name: str) -> sc.DataArray:
data_names = {f'{name}_events', 'data'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the reliance on naming. Just look for child data arrays, prefer to return the event-data one?

loaded = cast(
sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
)
loaded = snx.compute_positions(loaded)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to support the store_transform arg. The transforms are required in SANS (for the detectors). Probably for now we can just always enable this, we can revisit later if it turns out there are issues with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are stored in the data group. Does the sans code need to change the name of the item?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?

Copy link
Member

@SimonHeybrock SimonHeybrock Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jan-Lukas and I already agreed to just store it as "transform", and to simply raise on (unlikely) naming clash.

Comment on lines 339 to 344
if len(data_arrays) > 1:
raise ValueError(
"Raw data loaded from NeXus contains more than one data array. "
"Cannot uniquely identify the event or histogram data. "
f"Got items {set(dg.keys())}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't do, unfortunately. Facilities have the tendency of storing a histogrammed "preview" of the event data alongside the events. I think we should return the events if found, and the dense data otherwise.

@SimonHeybrock SimonHeybrock mentioned this pull request Mar 7, 2024
loaded = cast(
sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
)
loaded = snx.compute_positions(loaded)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?



def load_detector(
file_path: Union[FilePath, NeXusFile, NeXusGroup],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was your plan for using these in a pipeline? Would we wrap them into other providers? I'm asking because of the Union here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as long as any type is provided. So, this can be used in a pipeline as is. But I think all production pipelines will need different kinds of files. So we will have to wrap it.

@@ -0,0 +1,306 @@
# SPDX-License-Identifier: BSD-3-Clause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try (maybe as a pair-programming thing?) to test this on the loki workflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tested and it works nicely: see scipp/esssans#114

@jl-wynen
Copy link
Member Author

jl-wynen commented Mar 7, 2024

We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?

Added a check as for transformation. When discussing with Simon, we concluded to just user hard coded names for now because the chance of a conflict is very low. We can always add an argument later if need be.

@@ -234,7 +239,15 @@ def _load_group_with_positions(
loaded = cast(
sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
)
loaded = snx.compute_positions(loaded)

transform_out_name = 'transformation'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were using transform in ESSsans? transformation is nearly identical to transformations, the current name used for the subgroup holding the transformations, so this would be quite confusing. Can we stick with transform?

@jl-wynen jl-wynen merged commit 2b1d075 into main Mar 11, 2024
3 checks passed
@jl-wynen jl-wynen deleted the nexus-loaders branch March 11, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NeXus load helpers
3 participants