-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
McStas loader and time binning workflow. #5
Conversation
Here is the script I used to make a subset of data: from pathlib import Path
import h5py
import numpy as np
import os
file_path = Path('pulse5_z.h5')
copied_file_path = Path('pulse5_z.h5'.replace('.h5', '_subset.h5'))
print(f"Copy subset of {file_path} to {copied_file_path}")
# Remove file path if exists.
os.remove(copied_file_path)
# Copy all fields and subset of event data field.
# If you copy a file and remove a field, the file size does not change.
with h5py.File(file_path, 'r') as file:
entry = file['entry1']
dataset = entry['data']
event_path = 'bank01_events_dat_list_p_x_y_n_id_t'
events = dataset['bank01_events_dat_list_p_x_y_n_id_t']['events'][()]
print("Original Shape: ", events.shape)
with h5py.File(copied_file_path, 'w') as copied_file:
copied_entry = copied_file.create_group('entry1')
copied_data = copied_entry.create_group('data')
# Copy all non-data fields.
for key in entry.keys():
if key == 'data':
continue
else:
entry.copy(key, copied_entry, key)
# Copy all non-event data fields.
for data_key in copied_entry['data'].keys():
if data_key != event_path:
dataset.copy(data_key, copied_data, data_key)
# Copy subset of events.
copied_data.create_group(event_path)
choices = np.random.random(events.shape[0]) < 0.00001
subset = events[choices]
copied_data[event_path].create_dataset('events', data=subset)
print("Subset Shape: ", subset.shape) |
docs/examples/workflow.ipynb
Outdated
"This page will show simulation data workflow as an example. <br>\n", | ||
"They are written with ``sciline``, so we will show how to collect ``providers`` and ``parameters`` to build a workflow pipeline and compute the required result. <br>\n", | ||
"\n", | ||
"First, we will set up scipp logging widget in the notebook." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our current guideline for the use of logging? I know we have it in other workflows in the ess
repo, but with the use of sciline
, I personally feel that logging in the way we used to do it in ess
is not so useful?
If you have a graph of the pipeline, it's more useful than a list of steps in a log?
Or maybe there was a very good reason you needed to add logging in this notebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was one of the requests from Justin that he wants to quickly check the range of t_list
while computing the result.
Since we wants to see the intermediate result, not just a progress, it was the easiest way that I could think of...
docs/examples/workflow.ipynb
Outdated
"\n", | ||
"file_path = small_mcstas_sample() # Replace it with your data file path\n", | ||
"\n", | ||
"nmx_workflow = build_workflow(file_path)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In esssans
and essreflectometry
, we have kept the explicit use of Pipeline
in the notebook, and we didn't hide it behind a build_workflow
wrapper.
Should we keep it consistent and not use a wrapper?
Or do you think having to build the pipeline is too much code for users to write?
I personally think that it's good for users to have to build it, it gives them more understanding of what is going on under the hood.
But maybe it was a requirement from the NMX people?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe it was a requirement from the NMX people?
No, it was not a requirement.
But since nmx
doesn't really do so much here, I thought building the pipe itself was not so important.
But I agree that it should be consistent with others. I will unwrap them again.
src/ess/nmx/loader.py
Outdated
return Events(sc.DataArray(data=weights, coords={'t': t_list, 'id': id_list})) | ||
|
||
|
||
providers = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in other projects, the providers are now tuples instead of lists.
src/ess/nmx/workflow.py
Outdated
return NMXProviders([*logging_providers, *loader_providers, *reduction_providers]) | ||
|
||
|
||
def collect_default_parameters() -> NMXParams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would use the default parameters as in essreflectometry
:
params={
**default_parameters,
QBins: sc.geomspace(dim='Q', start=0.008, stop=0.075, num=200, unit='1/angstrom'),
SampleRotation[Sample]: sc.scalar(0.7989, unit='deg'),
Filename[Sample]: "sample.nxs",
SampleRotation[Reference]: sc.scalar(0.8389, unit='deg'),
Filename[Reference]: "reference.nxs",
}
and then explicitly create the pipeline in the notebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it was a bit annoying to have default_parameters
as a dictionary, because it's mutable.
I often accidentally changed the original dictionary so that is why I made this function instead...
src/ess/nmx/reduction.py
Outdated
... | ||
|
||
|
||
class Grouped(sl.Scope[FileType, sc.DataArray], sc.DataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make a name a little less generic? I didn't know what it was immediately. Something like GroupedByDetectorId
? (if that is indeed what it is...)
src/ess/nmx/reduction.py
Outdated
Calculate the distance between two points. | ||
""" | ||
diff = point_b - point_a | ||
return Distance(sc.sqrt(sc.dot(diff, diff))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use sc.norm
?
src/ess/nmx/reduction.py
Outdated
... | ||
|
||
|
||
def calculate_distance(point_a: Vector3D, point_b: Vector3D) -> Distance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function appears to not be used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... they are from the original notebook. I didn't know how much would be included in the loader example. I removed them for now.
src/ess/nmx/reduction.py
Outdated
RotationAngle = NewType("RotationAngle", sc.Variable) | ||
|
||
|
||
def rotation_matrix(axis: Vector3D, theta: RotationAngle) -> RotationMatirx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unused?
Co-authored-by: Neil Vaytet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should chat about all the providers and how things were split up here. In particular I don't think any of the McStas details should be exposed downstream, and none of the "downstream" types defined, e.g., in reduction.py
should depend on the file type.
The domain types and providers in Sciline are intended for encapsulation and hiding implementation details of one part of the pipeline (such as loading data) from other parts (such as data reduction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure what you exactly mean
so I just tried removing file-type generic providers here: #7
Closing this in favor of #9 |
Fixes #6
docs/examples/workflow.ipynb
has an example workflow that can load McStas data and bin into time dimension.TODO:
- [ ] Add another workflow example that has measurement data sample.Will be done separately.- [ ] Add documentation of loaders.