Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backfill creation and add_backfill() public #26786

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

deepyaman
Copy link
Contributor

@deepyaman deepyaman commented Jan 2, 2025

Summary & Motivation

Background

@smackesey has interacted with several users on the following scenario:

  • There is some computation (run or backfill) X that should only be launched under certain conditions
  • The conditions under which X should be launched are non-trivial to compute
  • It is desired to manually trigger conditional launch of X-- that is, on clicking a button, the conditions under which X should be launched are computed and X is launched if those conditions are satisfied.

Problem Statement

We do not have a good solution to the above scenario.

  • Launch job Y that conditionally launches X
    • If X is a job, execute_job will work.
    • If X is a backfill, then there is no good public API to launch it. User must use the private GQL API.
  • Manually trigger sensor that conditionally launches X
    • The way you manually trigger a sensor is by pressing the "Test Sensor" button, which is non-ideal and kinda buried.
    • You then have to pass each generated RunRequest through to the launchpad and manually launch.
      • This works OK if X is a job.
      • This is unworkable if X is a backfill, because you have to manually launch each RunRequest

So if X is a backfill, basically your only option is to use a job that then uses the private GQL API.

Objective

The purpose of this enhancement is to improve user experience. This will allow users to not have to resort to workarounds to perform a manual trigger a conditional launch of X.

The manifestation of this is to expose public Python APIs for creating and launching asset backfills.

Alternatives

#26731 added a DagsterInstance.launch_backfill() method; the approach in this PR is the direct result of feedback from @OwenKephart and @jamiedemaria there.

How I Tested These Changes

BK

TODO

I wanted to get feedback/validation that I'm on the right track before proceeding much further, but I'm guessing this API will need to be documented publicly.

Changelog

Exposed Python API for creating PartitionBackfill objects and adding them to a DagsterInstance.

@deepyaman deepyaman force-pushed the deepyaman/feat/public-partition-backfill-constructor branch from 01b6468 to cc0f94a Compare January 2, 2025 18:03
asset_selection: Sequence[AssetKey],
backfill_timestamp: float,
tags: Mapping[str, str],
# TODO(deepyaman): Expose `dagster_instance` instead for the public API.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done? If that's the case, would have something like PartitionBackfill.from_asset_partitions() that takes a DagsterInstance and, under the hood, calls PartitionBackfill._from_asset_partition() that takes a DynamicPartitionsStore. All of the existing implementations could call the internal method (and not be limited to the slightly more narrow DagsterInstance type).

The reason for doing this would be that DynamicPartitionsStore is not part of the public API. That said, I wanted to check before doing this refactoring.

@deepyaman deepyaman marked this pull request as ready for review January 2, 2025 18:38
@deepyaman deepyaman changed the title Add PartitionBackfill construction to public API Make backfill creation and add_backfill() public Jan 2, 2025
Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little skeptical that the exact same object that goes in serialized in the database is what we should also expose as our public API, I think that has bitten us before (looking at you, TimeWindow).

@deepyaman
Copy link
Contributor Author

I am a little skeptical that the exact same object that goes in serialized in the database is what we should also expose as our public API, I think that has bitten us before (looking at you, TimeWindow).

Do you persinally think the approach in #26731 may be better/safer, or are you thinking about something else?

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was imagining a different add_backfill API with a more constrained set of fields (the public API) that under the hood we have the ability to map onto the data object, so that we can add fields to the data object without it being a breaking change to the public API.

I also don't think all the fields currently on PartitionBackfill (serialized_asset_backfill_data?) should actually be considered part of our public API

@jamiedemaria
Copy link
Contributor

I think that has bitten us before (looking at you, TimeWindow)

A bit of a tangent from this PR, I haven't heard of this issue before, can you explain what has gone wrong before?

@gibsondan
Copy link
Member

This PR has some context on that: #22299

@deepyaman
Copy link
Contributor Author

I also don't think all the fields currently on PartitionBackfill (serialized_asset_backfill_data?) should actually be considered part of our public API

To play the devil's advocate, I feel like the only publicly-documented way to construct PartitionBackfill would be using the from_asset_partitions() class method, so in that sense, perhaps it's constrained in a sense?

Although, yes, nothing stops people from constructing it using the "undocumented" approach—that it's simply a NamedTuple sublclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants