Skip to content

feat: more flexible helper for batch reduction #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jokasimr
Copy link
Contributor

@jokasimr jokasimr commented May 5, 2025

This PR implements a shortcut through the workflow that is useful in several common scenarios involving multiple samples, angles and potentially multiple runs per angle.

@MridulS MridulS requested a review from Copilot May 6, 2025 07:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a more flexible helper for batch reduction by refactoring the dataset computation workflow and its associated tests. Key changes include:

  • Replacing the function orso_datasets_from_measurements with a more versatile from_measurements that supports both list and mapping inputs.
  • Adjusting the test cases to validate the new helper behavior and result types.
  • Moving the RawChopper class from the amor module to the reflectometry types module and updating corresponding imports.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/tools_test.py Updated tests to use from_measurements and verify expected parameters.
src/ess/reflectometry/workflow.py Adjusted imports to use the new location of RawChopper and streamline workflow.
src/ess/reflectometry/types.py Added RawChopper class to reflectometry/types to replace its previous location.
src/ess/reflectometry/tools.py Renamed the dataset helper and refactored scaling logic with additional helper functions.
src/ess/amor/types.py Removed duplicate RawChopper class.
src/ess/amor/load.py Removed the import and reference to RawChopper.

return datasets if names is None else dict(zip(names, datasets, strict=True))


def _workflow_needs_quantity_A_even_if_quantitiy_B_is_set(workflow, A, B):
Copy link
Preview

Copilot AI May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name '_workflow_needs_quantity_A_even_if_quantitiy_B_is_set' has a typo in 'quantitiy'. Consider correcting it to '_workflow_needs_quantity_A_even_if_quantity_B_is_set' for clarity.

Suggested change
def _workflow_needs_quantity_A_even_if_quantitiy_B_is_set(workflow, A, B):
def _workflow_needs_quantity_A_even_if_quantity_B_is_set(workflow, A, B):

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good bot!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍬

@github-project-automation github-project-automation bot moved this to In progress in Development Board May 6, 2025
@jokasimr jokasimr moved this from In progress to Selected in Development Board May 6, 2025
wf[Filename[SampleRun]] = parameters[Filename[SampleRun]]
return wf

if scale_to_overlap:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during the standup, I don't really get why we need the helper function, as opposed to just having different workflows.

  • You would have a workflow that computes a reflectivity curve.
  • If you want to compute this for multiple runs, you just map over the run numbers or filenames, and then use compute_mapped
  • If you want to scale_to_overlap, you have a workflow that basically does the mapping and adds a provider that either just scales the results or scales and merges. You then compute a different result (e.g. CombinedScaledReflectivityOverQ)

You can then view the graph of what is being done for everything (because with the helper function you can't see it all, only pieces, and they are not easy to get to either if you just installed the package?)

In any case, we should avoid having a long discussion here on github, this was just to have a starting point for an in-person discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible we can implement the same thing using sciline map reduce. Let us see the helper in this PR as a reference implementation and try to re-implement as one big workflow.

Copy link
Member

@nvaytet nvaytet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After in person discussion, it seems difficult to avoid using a helper function to loop over files, and use a single large pipeline for all cases.

We are already using mapping to combine events from multiple files, and having a double mapping to map over different angles, but using the same mapping parameter Filename[SampleRun] would not work.

Having Filename as a scope with 2 params could maybe work, Filename[SampleRun, Angle], but that would require changing it in essreduce in the types common to all other workflows.

In addition, the scale_to_overlap sometimes wants to scale the event weights before the final 1d R(Q) result, and it is not as simple as just inserting a provider at the end of the workflow to apply the scaling.

With all this taken into account, the helper function seems like a good approach.

@@ -279,18 +284,18 @@ def combine_curves(
)


def orso_datasets_from_measurements(
def from_measurements(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think of a better name? I don't have great suggestions, I thought of something like batch_reduction, but it's not super descriptive either...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either. I would like to avoid reduction in the name because I feel the term is already overloaded in our context, and strictly speaking you don't have to request a "reduced" data set here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked copilot, and the suggestion was compute_reflectivity_datasets.
I guess it's because of the last thing in the docstring that says

    Returns
    ---------
    list of the computed ORSO datasets, containing one reflectivity curve each

which I guess should be changed if you can request other data than reflectivity curves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, I'll change it.
By the way, I asked Nico earlier today and he's thinking about a better name, so we are probably going to have a suggestion from him soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed some options and we came up with compute_pipeline_targets or get_workflow_targets. They both seem reasonably descriptive to me but maybe the shorter one is better

# Check if any of the targets need ReducibleData if
# ReflectivityOverQ already exists.
# If they don't, we can avoid recomputing ReducibleData.
targets = target if hasattr(target, '__len__') else (target,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there ever be a danger that someone passed a string as target and the __len__ check fails? I guess if they passed a string, they will have other problems earlier? (target not found in graph)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, but like you said, passing a string is illegal anyway and will lead to error at some point.

@jokasimr
Copy link
Contributor Author

jokasimr commented Jun 4, 2025

Having Filename as a scope with 2 params could maybe work, Filename[SampleRun, Angle], but that would require changing it in essreduce in the types common to all other workflows.

Another problem with this approach is that the Angle is something we read from the file, so it seems backwards that the user has to know the angle before loading the file.

@jokasimr
Copy link
Contributor Author

jokasimr commented Jun 6, 2025

It would be good to merge this so we can get it out and receive some user feedback.
If it is the name that is holding this up I think we should just go with the current name and change it in the future if we come up with something better.
In that case the old name can be kept as an alias to the new, so backwards compatibility is not an issue.

@jokasimr jokasimr requested a review from nvaytet June 10, 2025 09:26
@nvaytet
Copy link
Member

nvaytet commented Jun 10, 2025

Finding a good name is important I think, especially since this looks like one of the main functions that will be used all the time. You can of course keep the alias if you change it, as you say, but why not try to find something better now?
Why not ask @paracini or other users for input?

I would like to avoid reduction in the name because [...] you don't have to request a "reduced" data set here.

ok, but realistically, I would say reduced data will be requested over 90% of the time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Selected
Development

Successfully merging this pull request may close these issues.

4 participants