-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make backfill creation and add_backfill()
public
#26786
base: master
Are you sure you want to change the base?
Make backfill creation and add_backfill()
public
#26786
Conversation
01b6468
to
cc0f94a
Compare
asset_selection: Sequence[AssetKey], | ||
backfill_timestamp: float, | ||
tags: Mapping[str, str], | ||
# TODO(deepyaman): Expose `dagster_instance` instead for the public API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be done? If that's the case, would have something like PartitionBackfill.from_asset_partitions()
that takes a DagsterInstance
and, under the hood, calls PartitionBackfill._from_asset_partition()
that takes a DynamicPartitionsStore
. All of the existing implementations could call the internal method (and not be limited to the slightly more narrow DagsterInstance
type).
The reason for doing this would be that DynamicPartitionsStore
is not part of the public API. That said, I wanted to check before doing this refactoring.
PartitionBackfill
construction to public APIadd_backfill()
public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little skeptical that the exact same object that goes in serialized in the database is what we should also expose as our public API, I think that has bitten us before (looking at you, TimeWindow).
Do you persinally think the approach in #26731 may be better/safer, or are you thinking about something else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was imagining a different add_backfill API with a more constrained set of fields (the public API) that under the hood we have the ability to map onto the data object, so that we can add fields to the data object without it being a breaking change to the public API.
I also don't think all the fields currently on PartitionBackfill (serialized_asset_backfill_data?) should actually be considered part of our public API
A bit of a tangent from this PR, I haven't heard of this issue before, can you explain what has gone wrong before? |
This PR has some context on that: #22299 |
To play the devil's advocate, I feel like the only publicly-documented way to construct Although, yes, nothing stops people from constructing it using the "undocumented" approach—that it's simply a |
Summary & Motivation
Background
@smackesey has interacted with several users on the following scenario:
Problem Statement
We do not have a good solution to the above scenario.
So if X is a backfill, basically your only option is to use a job that then uses the private GQL API.
Objective
The purpose of this enhancement is to improve user experience. This will allow users to not have to resort to workarounds to perform a manual trigger a conditional launch of X.
The manifestation of this is to expose public Python APIs for creating and launching asset backfills.
Alternatives
#26731 added a
DagsterInstance.launch_backfill()
method; the approach in this PR is the direct result of feedback from @OwenKephart and @jamiedemaria there.How I Tested These Changes
BK
TODO
I wanted to get feedback/validation that I'm on the right track before proceeding much further, but I'm guessing this API will need to be documented publicly.
Changelog
Exposed Python API for creating
PartitionBackfill
objects and adding them to aDagsterInstance
.