Skip to content

Commit 60bcb06

Browse files
Merge pull request #6 from scipp/guidelines-update
Clarify monitor and detector handling guidelines
2 parents b15bcb4 + 22589ac commit 60bcb06

File tree

1 file changed

+24
-11
lines changed

1 file changed

+24
-11
lines changed

docs/user-guide/reduction-workflow-guidelines.md

+24-11
Original file line numberDiff line numberDiff line change
@@ -78,23 +78,36 @@ This is often not apparent from small test data, as the location of performance
7878
**Note**
7979
There should be a workflow parameter (flag) to select whether to return event data or not.
8080

81-
### S.2: Split loading and handling of monitors and detectors from loading of auxiliary data and metadata
81+
### S.2: Load each required NXmonitor separately
8282

8383
**Reason**
84-
- Allows for more efficient parallelism and reduction in memory use.
85-
- Avoids loading large data into memory that is not needed for the reduction.
86-
- Avoids keeping large data alive in memory if output metadata extraction depends on auxiliary input data or input metadata.
84+
Monitor data can be extremely large when operating in event mode.
85+
Loading only individual monitors avoids loading unnecessary data and allows for more efficient parallelism and reduction in memory use.
8786

88-
### S.3: Avoid dependencies of output metadata on large data
87+
### S.3: Load each required NXdetector separately
88+
89+
**Reason**
90+
Detector data can be extremely large when operating in event mode.
91+
Loading only individual detectors avoids loading unnecessary data and allows for more efficient parallelism and reduction in memory use.
92+
93+
94+
### S.4: Load auxiliary data and metadata separately from monitors and detectors
95+
96+
**Reason**
97+
Event-mode monitor- and detector-data can be extremely large.
98+
Auxiliary data such as sample-environment data, or chopper-metadata should be accessible without loading the large data.
99+
Loading auxiliary data and metadata separately avoids keeping large data alive in memory if output metadata extraction depends on auxiliary input data or input metadata.
100+
101+
### S.5: Avoid dependencies of output metadata on large data
89102

90103
**Reason**
91104
Adding dependencies on large data to the output metadata extraction may lead to large data being kept alive in memory.
92105

93106
**Note**
94-
Most of this is avoided by following S.2.
107+
Most of this is avoided by following S.2, S.3, and S.4.
95108
A bad example would be writing the total raw counts to the output metadata, as this would require keeping the large data alive in memory, unless it is ensured that the task runs early.
96109

97-
### S.4: Preserve floating-point precision of input data and coordinates
110+
### S.6: Preserve floating-point precision of input data and coordinates
98111

99112
**Reason**
100113
Single-precision may be sufficient for most data.
@@ -112,13 +125,13 @@ This will allow for changing the precision of the entire workflow by choosing a
112125
- If time-of-flight is single-precision, wavelength and momentum transfer should be single-precision.
113126
- If counts are single-precision, reduced intensity should be single-precision.
114127

115-
### S.5: Switches to double-precision shall be deliberate, explicit, and documented
128+
### S.7: Switches to double-precision shall be deliberate, explicit, and documented
116129

117130
**Reason**
118131
Some workflows may require switching to double-precision at a certain point in the workflow.
119132
This should be a deliberate choice, and the reason for the switch should be documented.
120133

121-
### S.6: Propagation of uncertainties in broadcast operations should support "drop" and "upper-bound" strategies, "upper-bound" shall be the default
134+
### S.8: Propagation of uncertainties in broadcast operations should support "drop" and "upper-bound" strategies, "upper-bound" shall be the default
122135

123136
**Reason**
124137
Unless explicitly computed, the exact propagation of uncertainties in broadcast operations is not tractable.
@@ -129,7 +142,7 @@ We should therefore support two strategies, "drop" and "upper-bound", and "upper
129142
See [Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software](http://dx.doi.org/10.3233/JNR-220049) for a discussion of the topic.
130143
TODO Add reference to upper-bound approach.
131144

132-
### S.7: Do not write files or make write requests to services such as SciCat in providers
145+
### S.9: Do not write files or make write requests to services such as SciCat in providers
133146

134147
**Reason**
135148
Providers should be side-effect free, and should not write files or make write requests to services.
@@ -138,7 +151,7 @@ Providers should be side-effect free, and should not write files or make write r
138151
Workflows may run many times, or in parallel, or tasks may be retried after failure, and we want to avoid side-effects in these cases.
139152
This will, e.g., avoid unintentional overwriting of a user's files.
140153

141-
### S.8: Detector banks shall be loaded with their logical dimensions, if possible
154+
### S.10: Detector banks shall be loaded with their logical dimensions, if possible
142155

143156
**Reason**
144157
Using logical dims (instead of a flat list of pixels) allows for simpler indexing and slicing of the data, reductions over a subset of dimensions, and masking of physical components.

0 commit comments

Comments
 (0)