Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export dataset according to NXLauetof application definition #101

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

YooSunYoung
Copy link
Member

Related to #60

This PR only contains place holders.
Due to memory issue, we decided to split exporting function into panel-dependent one and panel-independent one.


def _add_lauetof_definition(nx_entry: h5py.Group) -> None:
nx_entry["definition"] = "NXlauetof"
nx_entry["definition"].attrs["NX_class"] = "NX_CHAR"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think fields in NeXus should have an NX_class attribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was specified in the application definition but I wasn't sure if it's the right way. Do you think we should remove them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't see it here: https://manual.nexusformat.org/classes/applications/NXlauetof.html. Can you give a reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it was DTYPE not NX_class sorry. I was confused cause they were written next to the field... I removed it.

Comment on lines +285 to +291
def export_panel_dependent_data_as_nxlauetof(
*dgs: sc.DataGroup, output_file: str | pathlib.Path | io.BytesIO
) -> None:
with h5py.File(output_file, "r+") as f:
nx_instrument: h5py.Group = f["entry/instrument"]
for dg in dgs:
_add_lauetof_detector_group(dg, nx_instrument)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. If dgs contains all panels, we still use 3x the memory (leaving aside intermediate steps). Wasn't the point to have a writer that works one panel at a time?

Copy link
Member Author

@YooSunYoung YooSunYoung Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you might also want to write them all at the same time, if they are not too large.
For example, you need all three panels at the same time if you want to see all three of them in an instrument view anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I thought you might also want" is often an indicator that it should be postponed until we know it is really needed. Otherwise you need to refactor three times and write tests. And I am not sure why the instrument view is related to this.

In any case: How do you imagine the interface for writing multiple panels to the same file when they are not in memory at the same time?

Copy link
Member Author

@YooSunYoung YooSunYoung Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I am not sure why the instrument view is related to this.

After using instrument view, there's no point of iterating it again and you can just save them all at once.

In any case: How do you imagine the interface for writing multiple panels to the same file when they are not in memory at the same time?

The idea was to save panel independent data first always and then append panel-dependent data into the same file.

file_name = "test.nxs"
export_panel_independent_data_as_nxlauetof(binned_dg, output_file=file_name)
export_panel_dependent_data_as_nxlauetof(binned_dg, output_file=file_name)
del binned_dg

for i in range(1, 3):
    with temp_parameter(wf, DetectorIndex, i) as temp_wf:
        reduced_data = temp_wf.compute(NMXReducedData)
        export_panel_dependent_data_as_nxlauetof(reduced_data, output_file=file_name)
        del reduced_data

And if you don't have problem having all three of them in the memory, you can just:

dg, dgs = all_data_groups # After collecting computed results
export_panel_independent_data_as_nxlauetof(dg, output_file)
export_panel_dependent_data_as_nxlauetof(dg, *dgs, output_file=output_file)

which can be wrapped into a single interface like:

essnmx/src/ess/nmx/nexus.py

Lines 292 to 301 in dce3496

def export_as_nxlauetof(
dg: sc.DataGroup, *dgs: sc.DataGroup, output_file: str | pathlib.Path | io.BytesIO
) -> None:
"""Export the reduced data into a nxlauetof format.
Exporting step is not expected to be part of sciline pipelines.
"""
export_panel_independent_data_as_nxlauetof(dg, output_file)
export_panel_dependent_data_as_nxlauetof(dg, *dgs, output_file=output_file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I have no objection to make it available for only single panel at once.
I'll update it then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the panel-independent data need the workflow? If so, how do we handle that?

Copy link
Member Author

@YooSunYoung YooSunYoung Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panel-independent data need the workflow

Monitor(control) group might need its own workflow, (we haven't seen any use case of monitor data yet)
but other than that, all other fields only needs a loader.

But also same information is available from the reduced data.
So I was planning to retrieve information from the reduced data.

Do you think we should just retrieve them directly from an input file...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was mainly wondering how we can design the interface to avoid user error. If the panel-independent data is obtain from the wrong workflow, or if someone tries to write panels from incompatible workflows (like different params/settings) to the same file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants