McStas loader and time binning workflow. #5

YooSunYoung · 2023-12-01T16:03:43Z

Fixes #6
docs/examples/workflow.ipynb has an example workflow that can load McStas data and bin into time dimension.

TODO:

Make small subset of simulation data for loader testing.
Update workflow example to use generated data or downloaded sample data.
~~- [ ] Add another workflow example that has measurement data sample.~~ Will be done separately.
Add unit tests for loaders and minimum reduction functions.
~~- [ ] Add documentation of loaders.~~

YooSunYoung · 2023-12-06T16:03:23Z

Here is the script I used to make a subset of data:

from pathlib import Path
import h5py
import numpy as np
import os

file_path =  Path('pulse5_z.h5')
copied_file_path = Path('pulse5_z.h5'.replace('.h5', '_subset.h5'))

print(f"Copy subset of {file_path} to {copied_file_path}")

# Remove file path if exists.
os.remove(copied_file_path)

# Copy all fields and subset of event data field.
# If you copy a file and remove a field, the file size does not change.
with h5py.File(file_path, 'r') as file:
    entry = file['entry1']
    dataset = entry['data']
    event_path = 'bank01_events_dat_list_p_x_y_n_id_t'
    events = dataset['bank01_events_dat_list_p_x_y_n_id_t']['events'][()]
    
    print("Original Shape: ", events.shape)
    with h5py.File(copied_file_path, 'w') as copied_file:
        copied_entry = copied_file.create_group('entry1')
        copied_data = copied_entry.create_group('data')
        
        # Copy all non-data fields.
        for key in entry.keys():
            if key == 'data':
               continue
            else:
                entry.copy(key, copied_entry, key)
        
        # Copy all non-event data fields.
        for data_key in copied_entry['data'].keys():
            if data_key != event_path:
                dataset.copy(data_key, copied_data, data_key)

        # Copy subset of events.
        copied_data.create_group(event_path)
        choices = np.random.random(events.shape[0]) < 0.00001
        subset = events[choices]
        copied_data[event_path].create_dataset('events', data=subset)
        print("Subset Shape: ", subset.shape)

docs/examples/workflow.ipynb

nvaytet · 2023-12-08T09:48:10Z

docs/examples/workflow.ipynb

+    "This page will show simulation data workflow as an example. <br>\n",
+    "They are written with ``sciline``, so we will show how to collect ``providers`` and ``parameters`` to build a workflow pipeline and compute the required result. <br>\n",
+    "\n",
+    "First, we will set up scipp logging widget in the notebook."


What is our current guideline for the use of logging? I know we have it in other workflows in the ess repo, but with the use of sciline, I personally feel that logging in the way we used to do it in ess is not so useful?

If you have a graph of the pipeline, it's more useful than a list of steps in a log?

Or maybe there was a very good reason you needed to add logging in this notebook?

It was one of the requests from Justin that he wants to quickly check the range of t_list while computing the result.

Since we wants to see the intermediate result, not just a progress, it was the easiest way that I could think of...

nvaytet · 2023-12-08T10:01:38Z

docs/examples/workflow.ipynb

+    "\n",
+    "file_path = small_mcstas_sample()  # Replace it with your data file path\n",
+    "\n",
+    "nmx_workflow = build_workflow(file_path)\n",


In esssans and essreflectometry, we have kept the explicit use of Pipeline in the notebook, and we didn't hide it behind a build_workflow wrapper.

Should we keep it consistent and not use a wrapper?
Or do you think having to build the pipeline is too much code for users to write?

I personally think that it's good for users to have to build it, it gives them more understanding of what is going on under the hood.

But maybe it was a requirement from the NMX people?

But maybe it was a requirement from the NMX people?

No, it was not a requirement.
But since nmx doesn't really do so much here, I thought building the pipe itself was not so important.
But I agree that it should be consistent with others. I will unwrap them again.

src/ess/nmx/detector.py

nvaytet · 2023-12-08T10:03:58Z

src/ess/nmx/loader.py

+    return Events(sc.DataArray(data=weights, coords={'t': t_list, 'id': id_list}))
+
+
+providers = [


I think in other projects, the providers are now tuples instead of lists.

nvaytet · 2023-12-08T10:23:25Z

src/ess/nmx/workflow.py

+    return NMXProviders([*logging_providers, *loader_providers, *reduction_providers])
+
+
+def collect_default_parameters() -> NMXParams:


I think I would use the default parameters as in essreflectometry:

params={ **default_parameters, QBins: sc.geomspace(dim='Q', start=0.008, stop=0.075, num=200, unit='1/angstrom'), SampleRotation[Sample]: sc.scalar(0.7989, unit='deg'), Filename[Sample]: "sample.nxs", SampleRotation[Reference]: sc.scalar(0.8389, unit='deg'), Filename[Reference]: "reference.nxs", }

and then explicitly create the pipeline in the notebook?

But it was a bit annoying to have default_parameters as a dictionary, because it's mutable.
I often accidentally changed the original dictionary so that is why I made this function instead...

nvaytet · 2023-12-08T10:28:07Z

src/ess/nmx/reduction.py

+    ...
+
+
+class Grouped(sl.Scope[FileType, sc.DataArray], sc.DataArray):


Could you make a name a little less generic? I didn't know what it was immediately. Something like GroupedByDetectorId? (if that is indeed what it is...)

nvaytet · 2023-12-08T10:28:19Z

src/ess/nmx/reduction.py

+    Calculate the distance between two points.
+    """
+    diff = point_b - point_a
+    return Distance(sc.sqrt(sc.dot(diff, diff)))


Use sc.norm?

nvaytet · 2023-12-08T10:29:14Z

src/ess/nmx/reduction.py

+    ...
+
+
+def calculate_distance(point_a: Vector3D, point_b: Vector3D) -> Distance:


This function appears to not be used anywhere?

Yes... they are from the original notebook. I didn't know how much would be included in the loader example. I removed them for now.

nvaytet · 2023-12-08T10:29:34Z

src/ess/nmx/reduction.py

+RotationAngle = NewType("RotationAngle", sc.Variable)
+
+
+def rotation_matrix(axis: Vector3D, theta: RotationAngle) -> RotationMatirx:


Also unused?

Co-authored-by: Neil Vaytet <[email protected]>

SimonHeybrock · 2023-12-11T05:00:35Z

src/ess/nmx/loader.py

We should chat about all the providers and how things were split up here. In particular I don't think any of the McStas details should be exposed downstream, and none of the "downstream" types defined, e.g., in reduction.py should depend on the file type.

The domain types and providers in Sciline are intended for encapsulation and hiding implementation details of one part of the pipeline (such as loading data) from other parts (such as data reduction).

I was not sure what you exactly mean
so I just tried removing file-type generic providers here: #7

YooSunYoung · 2023-12-12T13:34:49Z

Closing this in favor of #9

YooSunYoung and others added 6 commits November 30, 2023 12:04

Data reduction draft.

e02e0b1

Loader draft

b5897bf

Reduction partial draft.

2d371ee

Reduction partial draft.

c3287ae

Update loading and reduction example.

4005282

Apply automatic formatting

894347f

YooSunYoung self-assigned this Dec 1, 2023

YooSunYoung added 9 commits December 7, 2023 15:13

Add small dataset from pooch.

855ec20

Add logging providers.

dd4b48e

Update providers and parameters.

1ee62b3

Expose workflow building helper.

1da7c31

Update workflow example.

b01b799

Rename dimension.

843b7b2

Add tests for loader and workflow.

14a34ef

Add workflow helpers.

4aac51f

Add pooch in the requirements.

76aca6e

YooSunYoung marked this pull request as ready for review December 7, 2023 15:26

Update docs build dependencies.

0cd1def

nvaytet self-assigned this Dec 8, 2023

nvaytet reviewed Dec 8, 2023

View reviewed changes

YooSunYoung and others added 9 commits December 8, 2023 13:43

Apply suggestions from code review

7df1977

Co-authored-by: Neil Vaytet <[email protected]>

Collect providers into a tuple.

dd5b4aa

Update name to be more specific.

31c5d36

Remove unused functions.

6813fcc

Remove frequently adjusted parameter from default parameter dictionary.

801f8ed

Expose workflow pipeline building.

393b461

Rename variables.

c43a649

Fix typo.

64bfece

Update documentation.

1aa688f

YooSunYoung requested a review from nvaytet December 8, 2023 13:23

SimonHeybrock reviewed Dec 11, 2023

View reviewed changes

YooSunYoung mentioned this pull request Dec 11, 2023

Different version of loader. #7

Closed

YooSunYoung closed this Dec 12, 2023

YooSunYoung deleted the loader branch December 12, 2023 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

McStas loader and time binning workflow. #5

McStas loader and time binning workflow. #5

YooSunYoung commented Dec 1, 2023 •

edited

Loading

YooSunYoung commented Dec 6, 2023

nvaytet Dec 8, 2023

YooSunYoung Dec 8, 2023

nvaytet Dec 8, 2023

YooSunYoung Dec 8, 2023

nvaytet Dec 8, 2023

nvaytet Dec 8, 2023 •

edited

Loading

YooSunYoung Dec 8, 2023

nvaytet Dec 8, 2023

nvaytet Dec 8, 2023

nvaytet Dec 8, 2023

YooSunYoung Dec 8, 2023

nvaytet Dec 8, 2023

SimonHeybrock Dec 11, 2023 •

edited

Loading

YooSunYoung Dec 11, 2023

YooSunYoung commented Dec 12, 2023

		return Events(sc.DataArray(data=weights, coords={'t': t_list, 'id': id_list}))


		providers = [

		return NMXProviders([logging_providers, loader_providers, *reduction_providers])


		def collect_default_parameters() -> NMXParams:

		...


		class Grouped(sl.Scope[FileType, sc.DataArray], sc.DataArray):

		...


		def calculate_distance(point_a: Vector3D, point_b: Vector3D) -> Distance:

		RotationAngle = NewType("RotationAngle", sc.Variable)


		def rotation_matrix(axis: Vector3D, theta: RotationAngle) -> RotationMatirx:

McStas loader and time binning workflow. #5

McStas loader and time binning workflow. #5

Conversation

YooSunYoung commented Dec 1, 2023 • edited Loading

YooSunYoung commented Dec 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvaytet Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YooSunYoung commented Dec 12, 2023

YooSunYoung commented Dec 1, 2023 •

edited

Loading

nvaytet Dec 8, 2023 •

edited

Loading

SimonHeybrock Dec 11, 2023 •

edited

Loading