Thoughts on customizing auto-materialize policies #15029

sryza · 2023-06-29T14:48:06Z

sryza
Jun 29, 2023

This discussion includes some thoughts I've had recently as I've explored the design space for allowing auto-materialize policies to be customized. It does not include a concrete proposal for how to move forward.

The central question here: how do we allow customization without requiring users to implement auto-materialize policies from the ground up whenever they want to make a small tweak to behavior?

Background

Recently, some users have requested that auto-materialize policies should work differently in some cases:

This raises the question: what's the best way for users to customize their auto-materialize policies. A couple options come up:

Include parameters on AutoMaterializePolicy that users can set to toggle different behaviors. A risk here is that AutoMaterializePolicy might need to grow to include an obscene number of parameters to handle the long tail of requested user behavior.
Allow users to provide their own custom Python code for controlling when assets are auto-materialized. Some risks here are detailed below.

Naive fully-pluggable API

Here's what a naive version of option 2 could look like:

from dagster import auto_materialize_policy, asset

@auto_materialize_policy
def materialize_if_any_upstream_updated(context):
    if any_upstream_updated(context):
        return True  # materialize
    else:
        return False  # don't materialize

@asset(auto_materialize_policy=materialize_if_any_upstream_updated)
def asset1():
    ...

Challenges with the naive fully-pluggable API

The built-in auto-materialize policies have a lot of logic. For example:

They avoid auto-materializing assets that are part of running backfills
They avoid auto-materializing assets that are part of active runs
They avoid auto-materializing assets that have failed to materialize, until conditions change
They avoid auto-materializing asset partitions with missing parent partitions
They avoid “surprise backfills” by respecting a parameter that limits the number of partitions that can be requested at once
They avoid auto-materializing assets whose ancestors have ancestors with new upstream data

In the cases I’ve come across where users have wanted to customize auto-materialize behavior, they’ve wanted to do it in very targeted ways, e.g. “it should work like it normally does, except, for upstream asset X, it’s OK if some of the partitions are missing”.

If users who want to customize their auto-materialize behavior are asked to start from zero, it’s likely that they’ll neglect to handle some of the important cases that we’ve handled in the built-in policies. And if we improve the handling of these cases, or handle new cases that we missed earlier, then they won’t get access to the latest and greatest.

Request for input

If you're a user or prospective user of auto-materialize policies, thoughts on either of these subjects would be helpful:

In what ways would you like to customize your auto-materialize policies?
Do you have ideas on the ideal API for customizing auto-materialize policies?

prratek · 2023-07-24T15:58:50Z

prratek
Jul 24, 2023

I tend to think of Auto-Materialize policies as composed of multiple "materialization rules" (corresponding to the bullet points you listed above describing default logic). Is there room for a middle option where you can specify custom Python code to override a specific rule? The mental model in my head is of custom policies as Python classes that inherit from some base class and you can override one or more methods to customize how it handles missing partitions or active runs or whatever else.

0 replies

adam-bloom · 2023-12-29T18:03:22Z

adam-bloom
Dec 29, 2023

Adding custom rules would be great. I was hoping to add a custom rule to a policy, but found that serialization of it wasn't supported. Our use case is around certain times where asset materializations will fail due to vendor maintenance - I was hoping to add a custom skip rule to handle that logic. I've built that logic into schedules successfully previously, and am trying to determine the correct paradigm for a sensor/auto-materialization drive refactor.

0 replies

OwenKephart · 2024-08-09T18:36:11Z

OwenKephart
Aug 9, 2024
Maintainer

Closing this discussion! Subsumed by: #22811

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on customizing auto-materialize policies #15029

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Thoughts on customizing auto-materialize policies #15029

sryza Jun 29, 2023

Background

Naive fully-pluggable API

Challenges with the naive fully-pluggable API

Request for input

Replies: 3 comments

prratek Jul 24, 2023

adam-bloom Dec 29, 2023

OwenKephart Aug 9, 2024 Maintainer

sryza
Jun 29, 2023

prratek
Jul 24, 2023

adam-bloom
Dec 29, 2023

OwenKephart
Aug 9, 2024
Maintainer