Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bubble up exceptions happening when handling events #215

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Le-Merle
Copy link
Contributor

Description of changes

Using Task.WhenAll() will wait until every task is in a completed state before returning. This is problematic when a single subscription crashes, as we would normally expect the exception to bubble up to the IHost and crash it, thus enabling the entire app to be restarted, but it's not happening as long as at least one subscription hasn't crashed out.

This replaces Task.WhenAll() with a system that immediately bubbles up the first exception encountered.

Breaking changes

In theory, a pod that was running this background service would survive a crash in a subscription as long as any other subscription didn't crash. With this change, the pod is expected to crash entirely, enabling k8s to detect the broken state and restart it.

Additional checks

  • Added new tests that cover the code changes

@Le-Merle Le-Merle requested a review from a team as a code owner February 20, 2025 20:13
subscriptionTasks.Remove(task);

// This will rethrow the exception if there ever is one, otherwise we'll continue until all tasks are done
await task.ConfigureAwait(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic would generate possible unobserved task exception. I wonder if you should have a global CancellationTokenSource for all tasks, and cancel it at the first exception. Then, you would be able to use Task.WhenAll here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so would complexify the handling with AggregateExceptions, and potentially include OperationCanceledExceptions that we don't really care about. It would be possible to filter them out, but this introduces even more complexity for an arguably limited gain.

By throwing the first exception we can find, even though it could make some exceptions unobserved, the code to handle this is simpler. The end result will be similar, AKA the (most likely only) exception will bubble up and crash the pod, forcing a restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gcaya and I think this is a better strategy to cancel all tasks and use Task.WhenAll. This is not for the AggregateException, but to cleanup resources correctly.
When a task fails, there is no reason to let some tasks alive (in the case you have multiple subscription).

.Select(puller => Task.Run(() => puller.StartReceivingEventsAsync(stoppingToken), stoppingToken))
.ToList();

while (subscriptionTasks.Any())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily have to change to this but just as a fyi, here's something Antho sent
https://stackoverflow.com/questions/57313252/how-can-i-await-an-array-of-tasks-and-stop-waiting-on-first-exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very interesting way to optimize handling tasks and failing fast!

However, in our case, once the tasks are started they should be "eternal", so performance gains in handling multiple tasks are less important. Overall, here I feel like this would be a more complex logic than what is suggested, for limited gains.

I'm still going to keep that in mind for future use cases, though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way to do it in .NET 9 would be to use Task.WhenEach, but we are stuck in .NET 8 :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think perf is important here. In best case scenario, the tasks never complete.

Comment on lines +106 to +108
this.AcknowledgeEvents(cancellationToken),
this.ReleaseEvents(cancellationToken),
this.RejectEvents(cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants