You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/book/how-to/steps-pipelines/dynamic_pipelines.md
+89Lines changed: 89 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,90 @@ Use `runtime="inline"` when you need:
96
96
- Shared resources with the orchestrator
97
97
- Sequential execution
98
98
99
+
### Map/Reduce over collections
100
+
101
+
Dynamic pipelines support a high-level map/reduce pattern over sequence-like step outputs. This lets you fan out a step across items of a collection and then reduce the results without manually writing loops or loading data in the orchestration environment.
102
+
103
+
```python
104
+
from zenml import pipeline, step
105
+
106
+
@step
107
+
defproducer() -> list[int]:
108
+
return [1, 2, 3]
109
+
110
+
@step
111
+
defworker(value: int) -> int:
112
+
return value *2
113
+
114
+
@step
115
+
defreducer(values: list[int]) -> int:
116
+
returnsum(values)
117
+
118
+
@pipeline(dynamic=True, enable_cache=False)
119
+
defmap_reduce():
120
+
values = producer()
121
+
results = worker.map(values) # fan out over collection
122
+
reducer(results) # pass list of artifacts directly
123
+
```
124
+
125
+
Key points:
126
+
-`step.map(...)` fans out a step over sequence-like inputs.
127
+
- Steps can accept lists of artifacts directly as inputs (useful for reducers).
128
+
- You can pass the mapped output directly to a downstream step without loading in the orchestration environment.
129
+
130
+
#### Mapping semantics: map vs product
131
+
132
+
-`step.map(...)`: If multiple sequence-like inputs are provided, all must have the same length `n`. ZenML creates `n` mapped steps where the i-th step receives the i-th element from each input.
133
+
-`step.product(...)`: Creates a mapped step for each combination of elements across all input sequences (cartesian product).
134
+
135
+
Example (cartesian product):
136
+
137
+
```python
138
+
from zenml import pipeline, step
139
+
140
+
@step
141
+
defint_values() -> list[int]:
142
+
return [1, 2]
143
+
144
+
@step
145
+
defstr_values() -> list[str]:
146
+
return ["a", "b", "c"]
147
+
148
+
@step
149
+
defdo_something(a: int, b: str) -> int:
150
+
...
151
+
152
+
@pipeline(dynamic=True)
153
+
defcartesian_example():
154
+
a = int_values()
155
+
b = str_values()
156
+
# Produces 2 * 3 = 6 mapped steps
157
+
combine.product(a, b)
158
+
```
159
+
160
+
#### Broadcasting inputs with unmapped(...)
161
+
162
+
If you want to pass a sequence-like artifact as a whole to each mapped invocation (i.e., avoid splitting), wrap it with `unmapped(...)`:
163
+
164
+
```python
165
+
from zenml import pipeline, step, unmapped
166
+
167
+
@step
168
+
defproducer(length: int) -> list[int]:
169
+
return [1] * length
170
+
171
+
@step
172
+
defconsumer(a: int, b: list[int]) -> None:
173
+
# `b` is the full list for every mapped call
174
+
...
175
+
176
+
@pipeline(dynamic=True)
177
+
defunmapped_example():
178
+
a = producer(length=3) # list of 3 ints
179
+
b = producer(length=4) # list of 4 ints
180
+
consumer.map(a=a, b=unmapped(b))
181
+
```
182
+
99
183
### Parallel Step Execution
100
184
101
185
Dynamic pipelines support true parallel execution using `step.submit()`. This method returns a `StepRunFuture` that you can use to wait for results or pass to downstream steps:
@@ -205,6 +289,11 @@ def dynamic_pipeline():
205
289
206
290
When you call `.load()` on an artifact in a dynamic pipeline, it synchronously loads the data. For large artifacts or when you want to maintain parallelism, consider passing the step outputs (future or artifact) directly to downstream steps instead of loading them.
207
291
292
+
### Mapping Limitations
293
+
294
+
- Mapping is currently supported only over artifacts produced within the same pipeline run (mapping over raw data or external artifacts is not supported).
295
+
- Chunk size for mapped collection loading defaults to 1 and is not yet configurable.
296
+
208
297
## Best Practices
209
298
210
299
1. **Use `runtime="isolated"` for parallel steps**: This ensures better resource isolation and prevents interference between concurrent step executions.
0 commit comments