Generic NeXus loaders #10

jl-wynen · 2024-03-06T12:50:42Z

Fixes #8

They are heavily based on the LOKI loaders in ESSsans and the test files used in that package.

Failing because I can't figure out how to make scippnexus store positions in the data

SimonHeybrock · 2024-03-06T12:55:17Z

src/ess/reduce/nexus.py

+    file_path: Union[FilePath, NeXusFile, NeXusGroup],
+    *,
+    detector_name: DetectorName,
+    instrument_name: Optional[InstrumentName] = None,


I think the odds that there is more than one instrument is close to zero, much more likely is more than one entry, shouldn't we have EntryName instead?

Sounds reasonable

SimonHeybrock · 2024-03-06T12:56:24Z

src/ess/reduce/nexus.py

+         the sample NeXus group.
+    """
+    with _open_nexus_file(file_path) as f:
+        entry = f['entry']


We cannot assume this naming. Find unique NXentry, as suggested in the issue description.

SimonHeybrock · 2024-03-06T12:56:29Z

src/ess/reduce/nexus.py

+    instrument_name: Optional[InstrumentName] = None,
+) -> sc.DataGroup:
+    with _open_nexus_file(file_path) as f:
+        entry = f['entry']


SimonHeybrock · 2024-03-06T13:00:01Z

src/ess/reduce/nexus.py

+
+
+def extract_detector_data(
+    detector: RawDetector, detector_name: DetectorName


Suggested change

detector: RawDetector, detector_name: DetectorName

detector: RawDetector

I don't think we should have DetectorName here. Instead return the event data, if found, otherwise the child data array (dense data). Reason: I don't think we should rely on the detector name being related to the name of the event-data subgroup. In the case of monitors there may be no subgroup at all. Also it is simpler to just look for the child data array(s).

How can I identify the data? It's just one of many data arrays in the group.

Or can there only be one data array per detector / monitor?

SimonHeybrock · 2024-03-06T13:00:19Z

src/ess/reduce/nexus.py

+
+
+def extract_monitor_data(
+    monitor: RawMonitor, monitor_name: MonitorName


Suggested change

monitor: RawMonitor, monitor_name: MonitorName

monitor: RawMonitor

dito

SimonHeybrock · 2024-03-06T13:01:00Z

src/ess/reduce/nexus.py

+def _extract_events_or_histogram(dg: sc.DataGroup, name: str) -> sc.DataArray:
+    data_names = {f'{name}_events', 'data'}


I suggest to remove the reliance on naming. Just look for child data arrays, prefer to return the event-data one?

SimonHeybrock · 2024-03-07T05:28:24Z

src/ess/reduce/nexus.py

+        loaded = cast(
+            sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
+        )
+        loaded = snx.compute_positions(loaded)


We have to support the store_transform arg. The transforms are required in SANS (for the detectors). Probably for now we can just always enable this, we can revisit later if it turns out there are issues with it.

They are stored in the data group. Does the sans code need to change the name of the item?

We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?

Jan-Lukas and I already agreed to just store it as "transform", and to simply raise on (unlikely) naming clash.

SimonHeybrock · 2024-03-07T05:29:49Z

src/ess/reduce/nexus.py

+    if len(data_arrays) > 1:
+        raise ValueError(
+            "Raw data loaded from NeXus contains more than one data array. "
+            "Cannot uniquely identify the event or histogram data. "
+            f"Got items {set(dg.keys())}"
+        )


This won't do, unfortunately. Facilities have the tendency of storing a histogrammed "preview" of the event data alongside the events. I think we should return the events if found, and the dense data otherwise.

src/ess/reduce/nexus.py

nvaytet · 2024-03-07T09:56:32Z

src/ess/reduce/nexus.py

+        loaded = cast(
+            sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
+        )
+        loaded = snx.compute_positions(loaded)


We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?

nvaytet · 2024-03-07T09:58:52Z

src/ess/reduce/nexus.py

+
+
+def load_detector(
+    file_path: Union[FilePath, NeXusFile, NeXusGroup],


What was your plan for using these in a pipeline? Would we wrap them into other providers? I'm asking because of the Union here.

It works as long as any type is provided. So, this can be used in a pipeline as is. But I think all production pipelines will need different kinds of files. So we will have to wrap it.

nvaytet · 2024-03-07T10:52:30Z

tests/nexus_test.py

@@ -0,0 +1,306 @@
+# SPDX-License-Identifier: BSD-3-Clause


Should we try (maybe as a pair-programming thing?) to test this on the loki workflow?

We tested and it works nicely: see scipp/esssans#114

jl-wynen · 2024-03-07T14:13:34Z

We need to know where to find it, so we relied on a name that was defined in the default parameters.

I guess it could either be a parameter passed to this, or maybe it's enough to just have always the same name hard-coded here, but we need to make sure we never have a name clash in a file?

Added a check as for transformation. When discussing with Simon, we concluded to just user hard coded names for now because the chance of a conflict is very low. We can always add an argument later if need be.

SimonHeybrock · 2024-03-11T03:31:59Z

src/ess/reduce/nexus.py

@@ -234,7 +239,15 @@ def _load_group_with_positions(
        loaded = cast(
            sc.DataGroup, _unique_child_group(instrument, nx_class, group_name)[()]
        )
-        loaded = snx.compute_positions(loaded)
+
+        transform_out_name = 'transformation'


I think we were using transform in ESSsans? transformation is nearly identical to transformations, the current name used for the subgroup holding the transformations, so this would be quite confusing. Can we stick with transform?

jl-wynen added 14 commits March 5, 2024 11:09

Depend on scipp, scippnexus

7bf168f

Skeleton for load_detector

28e1f60

Deduce instrument and detector

aa43050

use pyfakefs

d57451a

Test with file path and buffer

2c8c951

Wrap RawDetector in a data group

79804d1

Add load_monitor and support dense data

3c20d3d

Attempt to compute positions

a6237c8

Failing because I can't figure out how to make scippnexus store positions in the data

Detector test data with pixel binning

4bc417a

Do not store position for monitor

c859a37

Add load_source

2f614d5

Add load_sample

43bc991

Add data extraction functions

5fbd916

Document nexus utilities

0f76d4c

SimonHeybrock reviewed Mar 6, 2024

View reviewed changes

jl-wynen added 4 commits March 6, 2024 14:13

Parametrize entry name instead instrument name

3021dd3

Use temp dir instead of pyfakefs

1358ff1

Test with multiple entries

4391639

Extract data by type not name

1ffdd36

SimonHeybrock reviewed Mar 7, 2024

View reviewed changes

SimonHeybrock mentioned this pull request Mar 7, 2024

Release ESSreduce #11

Closed

Store combined transformation

4312a8b

nvaytet reviewed Mar 7, 2024

View reviewed changes

jl-wynen added 4 commits March 7, 2024 12:55

Add logging module

dc0d258

Extract data by type not name

1a388a8

Fix typo

52c2eae

Check for name conflict with position

4446a00

Add NeXus prefix to Names

3a51a10

SimonHeybrock reviewed Mar 11, 2024

View reviewed changes

Rename transformation -> transform

99a6a48

SimonHeybrock approved these changes Mar 11, 2024

View reviewed changes

jl-wynen merged commit 2b1d075 into main Mar 11, 2024
3 checks passed

jl-wynen deleted the nexus-loaders branch March 11, 2024 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic NeXus loaders #10

Generic NeXus loaders #10

jl-wynen commented Mar 6, 2024

SimonHeybrock Mar 6, 2024

jl-wynen Mar 6, 2024

SimonHeybrock Mar 6, 2024

SimonHeybrock Mar 6, 2024

SimonHeybrock Mar 6, 2024

jl-wynen Mar 6, 2024

jl-wynen Mar 6, 2024

SimonHeybrock Mar 6, 2024

SimonHeybrock Mar 6, 2024

SimonHeybrock Mar 7, 2024

jl-wynen Mar 7, 2024

nvaytet Mar 7, 2024

SimonHeybrock Mar 7, 2024 •

edited

Loading

SimonHeybrock Mar 7, 2024

nvaytet Mar 7, 2024

nvaytet Mar 7, 2024

jl-wynen Mar 7, 2024

nvaytet Mar 7, 2024

jl-wynen Mar 7, 2024

nvaytet Mar 8, 2024

jl-wynen commented Mar 7, 2024

SimonHeybrock Mar 11, 2024



		def extract_detector_data(
		detector: RawDetector, detector_name: DetectorName

	detector: RawDetector, detector_name: DetectorName
	detector: RawDetector



		def extract_monitor_data(
		monitor: RawMonitor, monitor_name: MonitorName

	monitor: RawMonitor, monitor_name: MonitorName
	monitor: RawMonitor

		def _extract_events_or_histogram(dg: sc.DataGroup, name: str) -> sc.DataArray:
		data_names = {f'{name}_events', 'data'}



		def load_detector(
		file_path: Union[FilePath, NeXusFile, NeXusGroup],

Generic NeXus loaders #10

Generic NeXus loaders #10

Conversation

jl-wynen commented Mar 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jl-wynen commented Mar 7, 2024

Choose a reason for hiding this comment

SimonHeybrock Mar 7, 2024 •

edited

Loading