Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split loading of monitors and detector data #82

Closed
SimonHeybrock opened this issue Feb 12, 2024 · 4 comments
Closed

Split loading of monitors and detector data #82

SimonHeybrock opened this issue Feb 12, 2024 · 4 comments

Comments

@SimonHeybrock
Copy link
Member

SimonHeybrock commented Feb 12, 2024

There are a couple reasons for this:

  1. Increase opportunities for task-graph level parallelism.
  2. If monitors are large, it may be helpful to avoid issues with memory.
  3. Simplify handling of transmission runs, where we only want to load monitors. This would avoid having to worry about whether the transmission runs have detector data (which we would want to exclude from the load), or data-runs where we want to avoid loading the transmission monitor (if there is a separate transmission monitor).

Should also be considered in relation to #83.

@nvaytet
Copy link
Member

nvaytet commented Feb 12, 2024

Note that a first attempt at this was done when implementing the direct beam iterations.
It was however reverted in favour of a load_nexus which loaded everything internally.

If I remember correctly, this was mostly for simplicity and clarity in the final task graph, but we did not rule out implementing this in the future.

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Feb 12, 2024

I have experiment with larger files today (up to about 1e9 in detector and monitor) and my conclusion was that we have to be more deliberate about dependencies in the graphs, such as avoiding keeping alive intermediates by using them in multiple places.

We actually have to also sit down and have a look at final meta data assembly, e.g., as implemented in scipp/essreflectometry#27, as this potentially introduces long term "global" dependencies that result in keeping alive raw data longer than we want. @jl-wynen has already ran into some problem, I think, so we really have to figure out some guidelines/lessons.

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Feb 15, 2024

After some more thought, I believe we should also split loading "the rest" of the file: We need information such as chopper logs to be able to process either monitors or data, and there are probably also other examples such as determining slicing. So my suggestion is something like:

  1. Load everything but monitors and detectors
  2. Some pre-processing
  3. Load
    • monitors
    • detectors

Furthermore, if we need to resort to chunking the processing of the events due to memory limitations, it would be advantageous to load and process the "rest" only once.

Splitting this is probably also beneficial for reusing more code for the live-reduction, where we initially get the non-event data, and a series of updates containing events and log values.

Make sure to see also scipp/beamlime#134.

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Feb 26, 2024

Absorbed in #83 and #99.

@github-project-automation github-project-automation bot moved this from Triage to Done in Development Board Feb 26, 2024
@SimonHeybrock SimonHeybrock removed this from the Essentials milestone Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants