Make backfill creation and `add_backfill()` public #26786

deepyaman · 2025-01-02T15:51:53Z

Summary & Motivation

Background

@smackesey has interacted with several users on the following scenario:

There is some computation (run or backfill) X that should only be launched under certain conditions
The conditions under which X should be launched are non-trivial to compute
It is desired to manually trigger conditional launch of X-- that is, on clicking a button, the conditions under which X should be launched are computed and X is launched if those conditions are satisfied.

Problem Statement

We do not have a good solution to the above scenario.

Launch job Y that conditionally launches X
- If X is a job, execute_job will work.
- If X is a backfill, then there is no good public API to launch it. User must use the private GQL API.
Manually trigger sensor that conditionally launches X
- The way you manually trigger a sensor is by pressing the "Test Sensor" button, which is non-ideal and kinda buried.
- You then have to pass each generated RunRequest through to the launchpad and manually launch.
  - This works OK if X is a job.
  - This is unworkable if X is a backfill, because you have to manually launch each RunRequest

So if X is a backfill, basically your only option is to use a job that then uses the private GQL API.

Objective

The purpose of this enhancement is to improve user experience. This will allow users to not have to resort to workarounds to perform a manual trigger a conditional launch of X.

The manifestation of this is to expose public Python APIs for creating and launching asset backfills.

Alternatives

#26731 added a DagsterInstance.launch_backfill() method; the approach in this PR is the direct result of feedback from @OwenKephart and @jamiedemaria there.

How I Tested These Changes

BK

TODO

I wanted to get feedback/validation that I'm on the right track before proceeding much further, but I'm guessing this API will need to be documented publicly.

Changelog

Exposed Python API for creating PartitionBackfill objects and adding them to a DagsterInstance.

deepyaman · 2025-01-02T18:38:14Z

python_modules/dagster/dagster/_core/execution/backfill.py

        asset_selection: Sequence[AssetKey],
-        backfill_timestamp: float,
-        tags: Mapping[str, str],
+        # TODO(deepyaman): Expose `dagster_instance` instead for the public API.


Should this be done? If that's the case, would have something like PartitionBackfill.from_asset_partitions() that takes a DagsterInstance and, under the hood, calls PartitionBackfill._from_asset_partition() that takes a DynamicPartitionsStore. All of the existing implementations could call the internal method (and not be limited to the slightly more narrow DagsterInstance type).

The reason for doing this would be that DynamicPartitionsStore is not part of the public API. That said, I wanted to check before doing this refactoring.

gibsondan

I am a little skeptical that the exact same object that goes in serialized in the database is what we should also expose as our public API, I think that has bitten us before (looking at you, TimeWindow).

deepyaman · 2025-01-02T20:47:58Z

I am a little skeptical that the exact same object that goes in serialized in the database is what we should also expose as our public API, I think that has bitten us before (looking at you, TimeWindow).

Do you persinally think the approach in #26731 may be better/safer, or are you thinking about something else?

gibsondan

I guess I was imagining a different add_backfill API with a more constrained set of fields (the public API) that under the hood we have the ability to map onto the data object, so that we can add fields to the data object without it being a breaking change to the public API.

I also don't think all the fields currently on PartitionBackfill (serialized_asset_backfill_data?) should actually be considered part of our public API

jamiedemaria · 2025-01-02T20:53:56Z

I think that has bitten us before (looking at you, TimeWindow)

A bit of a tangent from this PR, I haven't heard of this issue before, can you explain what has gone wrong before?

gibsondan · 2025-01-02T20:56:44Z

This PR has some context on that: #22299

deepyaman · 2025-01-03T04:32:47Z

I also don't think all the fields currently on PartitionBackfill (serialized_asset_backfill_data?) should actually be considered part of our public API

To play the devil's advocate, I feel like the only publicly-documented way to construct PartitionBackfill would be using the from_asset_partitions() class method, so in that sense, perhaps it's constrained in a sense?

Although, yes, nothing stops people from constructing it using the "undocumented" approach—that it's simply a NamedTuple sublclass.

deepyaman added 3 commits January 2, 2025 09:04

Add PartitionBackfill construction to public API

3768e5a

Expose the DagsterInstance.add_backfill() method

8a4ea23

Avoid indirectly loading grpc by deferring imports

cc0f94a

deepyaman force-pushed the deepyaman/feat/public-partition-backfill-constructor branch from 01b6468 to cc0f94a Compare January 2, 2025 18:03

deepyaman requested review from smackesey, jamiedemaria and OwenKephart January 2, 2025 18:35

deepyaman commented Jan 2, 2025

View reviewed changes

deepyaman marked this pull request as ready for review January 2, 2025 18:38

deepyaman mentioned this pull request Jan 2, 2025

[rfc] Implement Python API for launching asset backfills #26731

Draft

4 tasks

deepyaman changed the title ~~Add PartitionBackfill construction to public API~~ Make backfill creation and add_backfill() public Jan 2, 2025

gibsondan reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make backfill creation and `add_backfill()` public #26786

Make backfill creation and `add_backfill()` public #26786

deepyaman commented Jan 2, 2025 •

edited

Loading

deepyaman Jan 2, 2025

gibsondan left a comment

deepyaman commented Jan 2, 2025

gibsondan left a comment

jamiedemaria commented Jan 2, 2025

gibsondan commented Jan 2, 2025

deepyaman commented Jan 3, 2025

Make backfill creation and add_backfill() public #26786

Are you sure you want to change the base?

Make backfill creation and add_backfill() public #26786

Conversation

deepyaman commented Jan 2, 2025 • edited Loading

Summary & Motivation

Background

Problem Statement

Objective

Alternatives

How I Tested These Changes

TODO

Changelog

deepyaman Jan 2, 2025

Choose a reason for hiding this comment

gibsondan left a comment

Choose a reason for hiding this comment

deepyaman commented Jan 2, 2025

gibsondan left a comment

Choose a reason for hiding this comment

jamiedemaria commented Jan 2, 2025

gibsondan commented Jan 2, 2025

deepyaman commented Jan 3, 2025

Make backfill creation and `add_backfill()` public #26786

Make backfill creation and `add_backfill()` public #26786

deepyaman commented Jan 2, 2025 •

edited

Loading