Using Dynamic Graphs and Op Priority to run each batch of data from end-to-end before moving to the next #16264
vandenn
started this conversation in
Show and tell
Replies: 2 comments
-
This is really nice, thanks for sharing this! |
Beta Was this translation helpful? Give feedback.
0 replies
-
@vandenn You are a genius, thanks for this :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Repository Link
I recently got some time to create a reference repository of how I use Dynamic Graphs and the
dagster/priority
tag in tandem to improve my job's performance and save memory:https://github.com/vandenn/dagster-prio-dynamic-map
Overview
When you use
DynamicOut
, you can chunk data into batches yielded by the dynamic op, but each subsequent op thatmap
s to these batches executes on all of them first before moving to the next op:However, this can be a problem if you have memory constraints in the machine you've deployed Dagster to.
By combining it with the
dagster/priority
tag, you can run the "mini-job" end-to-end for a group of batches of data (number of batches depends on your machine's power or your specifiedmax_concurrent
tag) before moving to the next group. This will allow you to clean/dispose already processed data as the job goes through each group of batches.Final Note
Let me know what you think! I'd appreciate any feedback or comments.
Beta Was this translation helpful? Give feedback.
All reactions