Dagster schedule ticks that fail and retry should indicate in the tick timeline that they retried #23549

gibsondan · 2024-08-09T14:33:19Z

What's the use case?

If a schedule tick fails due to a code server being temporarily unavailable, the tick will retry until the code server is available again. If that retry succeeds, the schedule timeline gives no indication of the original failure.

Furthermore, Dagster+ alerts will fire if this happens, but link to a timeline with no indication that there was a failure.

The feature request here is to indicate in the scheduler timeline that the tick originally failed.

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

salazarm · 2024-08-12T19:13:31Z

Tracking internally: https://linear.app/dagster-labs/issue/FE-508/dagster-schedule-ticks-that-fail-and-retry-should-indicate-in-the-tick

bengotow · 2024-08-12T19:14:52Z

Hey @gibsondan what is the expected value on the InsigationTick that indicates that it has failed and then succeeded? I'm guessing it's something like error is not null AND status = success?

Do the tick timestamp and endTimestamp get overwritten during the retry? I guess I'm not sure that our tick model has all the information required to effectively represent both occurrences of the tick execution. Maybe the backend could create a new tick rather than mutating the failed one?

gibsondan · 2024-08-12T19:19:23Z

@bengotow some backend change is going to be needed here first, yeah. Here's the relevant python code: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_scheduler/scheduler.py#L588-L596 - it would need to leave a better trace that this tick previously failed when this happens

gibsondan · 2024-08-12T19:20:23Z

The problem with making a new tick is that with schedules we assume that there's at most one timestamp for a given schedule 'tick' and use that for checkpointing purposes.

hellendag · 2024-10-14T19:25:55Z

Tracking internally as https://linear.app/dagster-labs/issue/FE-508.

gibsondan · 2025-01-03T23:09:15Z

@bengotow et al I finally made some progress on this: #26823 - there is a new field that is available on the frontend that we can use to distinguish between two ticks that are retries of the same 'scheduled execution time'. We just need to be sure to use that field when that is the information that we want to display.

After that change goes in and we make some additional changes to schedule retry behavior on the backend (in progress here): #26824 we'll have two different ticks in the timeline to work with, with different actual timestamps but the same scheduled timestamp.

gibsondan added the type: feature-request label Aug 9, 2024

garethbrickman added the area: UI/UX Related to User Interface and User Experience label Aug 9, 2024

garethbrickman added this to Dagster UI/UX Aug 9, 2024

github-project-automation bot moved this to Untriaged in Dagster UI/UX Aug 9, 2024

salazarm moved this from Untriaged to Needs details in Dagster UI/UX Aug 12, 2024

salazarm moved this from Needs details to Tracking internally in Dagster UI/UX Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dagster schedule ticks that fail and retry should indicate in the tick timeline that they retried #23549

Dagster schedule ticks that fail and retry should indicate in the tick timeline that they retried #23549

gibsondan commented Aug 9, 2024

salazarm commented Aug 12, 2024

bengotow commented Aug 12, 2024

gibsondan commented Aug 12, 2024

gibsondan commented Aug 12, 2024

hellendag commented Oct 14, 2024

gibsondan commented Jan 3, 2025

Dagster schedule ticks that fail and retry should indicate in the tick timeline that they retried #23549

Dagster schedule ticks that fail and retry should indicate in the tick timeline that they retried #23549

Comments

gibsondan commented Aug 9, 2024

What's the use case?

Ideas of implementation

Additional information

Message from the maintainers

salazarm commented Aug 12, 2024

bengotow commented Aug 12, 2024

gibsondan commented Aug 12, 2024

gibsondan commented Aug 12, 2024

hellendag commented Oct 14, 2024

gibsondan commented Jan 3, 2025