You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/user-guide/reduction-workflow-guidelines.md
+24-11
Original file line number
Diff line number
Diff line change
@@ -78,23 +78,36 @@ This is often not apparent from small test data, as the location of performance
78
78
**Note**
79
79
There should be a workflow parameter (flag) to select whether to return event data or not.
80
80
81
-
### S.2: Split loading and handling of monitors and detectors from loading of auxiliary data and metadata
81
+
### S.2: Load each required NXmonitor separately
82
82
83
83
**Reason**
84
-
- Allows for more efficient parallelism and reduction in memory use.
85
-
- Avoids loading large data into memory that is not needed for the reduction.
86
-
- Avoids keeping large data alive in memory if output metadata extraction depends on auxiliary input data or input metadata.
84
+
Monitor data can be extremely large when operating in event mode.
85
+
Loading only individual monitors avoids loading unnecessary data and allows for more efficient parallelism and reduction in memory use.
87
86
88
-
### S.3: Avoid dependencies of output metadata on large data
87
+
### S.3: Load each required NXdetector separately
88
+
89
+
**Reason**
90
+
Detector data can be extremely large when operating in event mode.
91
+
Loading only individual detectors avoids loading unnecessary data and allows for more efficient parallelism and reduction in memory use.
92
+
93
+
94
+
### S.4: Load auxiliary data and metadata separately from monitors and detectors
95
+
96
+
**Reason**
97
+
Event-mode monitor- and detector-data can be extremely large.
98
+
Auxiliary data such as sample-environment data, or chopper-metadata should be accessible without loading the large data.
99
+
Loading auxiliary data and metadata separately avoids keeping large data alive in memory if output metadata extraction depends on auxiliary input data or input metadata.
100
+
101
+
### S.5: Avoid dependencies of output metadata on large data
89
102
90
103
**Reason**
91
104
Adding dependencies on large data to the output metadata extraction may lead to large data being kept alive in memory.
92
105
93
106
**Note**
94
-
Most of this is avoided by following S.2.
107
+
Most of this is avoided by following S.2, S.3, and S.4.
95
108
A bad example would be writing the total raw counts to the output metadata, as this would require keeping the large data alive in memory, unless it is ensured that the task runs early.
96
109
97
-
### S.4: Preserve floating-point precision of input data and coordinates
110
+
### S.6: Preserve floating-point precision of input data and coordinates
98
111
99
112
**Reason**
100
113
Single-precision may be sufficient for most data.
@@ -112,13 +125,13 @@ This will allow for changing the precision of the entire workflow by choosing a
112
125
- If time-of-flight is single-precision, wavelength and momentum transfer should be single-precision.
113
126
- If counts are single-precision, reduced intensity should be single-precision.
114
127
115
-
### S.5: Switches to double-precision shall be deliberate, explicit, and documented
128
+
### S.7: Switches to double-precision shall be deliberate, explicit, and documented
116
129
117
130
**Reason**
118
131
Some workflows may require switching to double-precision at a certain point in the workflow.
119
132
This should be a deliberate choice, and the reason for the switch should be documented.
120
133
121
-
### S.6: Propagation of uncertainties in broadcast operations should support "drop" and "upper-bound" strategies, "upper-bound" shall be the default
134
+
### S.8: Propagation of uncertainties in broadcast operations should support "drop" and "upper-bound" strategies, "upper-bound" shall be the default
122
135
123
136
**Reason**
124
137
Unless explicitly computed, the exact propagation of uncertainties in broadcast operations is not tractable.
@@ -129,7 +142,7 @@ We should therefore support two strategies, "drop" and "upper-bound", and "upper
129
142
See [Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software](http://dx.doi.org/10.3233/JNR-220049) for a discussion of the topic.
130
143
TODO Add reference to upper-bound approach.
131
144
132
-
### S.7: Do not write files or make write requests to services such as SciCat in providers
145
+
### S.9: Do not write files or make write requests to services such as SciCat in providers
133
146
134
147
**Reason**
135
148
Providers should be side-effect free, and should not write files or make write requests to services.
@@ -138,7 +151,7 @@ Providers should be side-effect free, and should not write files or make write r
138
151
Workflows may run many times, or in parallel, or tasks may be retried after failure, and we want to avoid side-effects in these cases.
139
152
This will, e.g., avoid unintentional overwriting of a user's files.
140
153
141
-
### S.8: Detector banks shall be loaded with their logical dimensions, if possible
154
+
### S.10: Detector banks shall be loaded with their logical dimensions, if possible
142
155
143
156
**Reason**
144
157
Using logical dims (instead of a flat list of pixels) allows for simpler indexing and slicing of the data, reductions over a subset of dimensions, and masking of physical components.
0 commit comments