-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bubble up exceptions happening when handling events #215
base: main
Are you sure you want to change the base?
Conversation
src/Workleap.DomainEventPropagation.Subscription.PullDelivery/EventPullerService.cs
Outdated
Show resolved
Hide resolved
src/Workleap.DomainEventPropagation.Subscription.PullDelivery/EventPullerService.cs
Outdated
Show resolved
Hide resolved
subscriptionTasks.Remove(task); | ||
|
||
// This will rethrow the exception if there ever is one, otherwise we'll continue until all tasks are done | ||
await task.ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic would generate possible unobserved task exception. I wonder if you should have a global CancellationTokenSource for all tasks, and cancel it at the first exception. Then, you would be able to use Task.WhenAll
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing so would complexify the handling with AggregateExceptions, and potentially include OperationCanceledExceptions that we don't really care about. It would be possible to filter them out, but this introduces even more complexity for an arguably limited gain.
By throwing the first exception we can find, even though it could make some exceptions unobserved, the code to handle this is simpler. The end result will be similar, AKA the (most likely only) exception will bubble up and crash the pod, forcing a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gcaya and I think this is a better strategy to cancel all tasks and use Task.WhenAll
. This is not for the AggregateException
, but to cleanup resources correctly.
When a task fails, there is no reason to let some tasks alive (in the case you have multiple subscription).
.Select(puller => Task.Run(() => puller.StartReceivingEventsAsync(stoppingToken), stoppingToken)) | ||
.ToList(); | ||
|
||
while (subscriptionTasks.Any()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't necessarily have to change to this but just as a fyi, here's something Antho sent
https://stackoverflow.com/questions/57313252/how-can-i-await-an-array-of-tasks-and-stop-waiting-on-first-exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very interesting way to optimize handling tasks and failing fast!
However, in our case, once the tasks are started they should be "eternal", so performance gains in handling multiple tasks are less important. Overall, here I feel like this would be a more complex logic than what is suggested, for limited gains.
I'm still going to keep that in mind for future use cases, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way to do it in .NET 9 would be to use Task.WhenEach
, but we are stuck in .NET 8 :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think perf is important here. In best case scenario, the tasks never complete.
src/Workleap.DomainEventPropagation.Subscription.PullDelivery/EventPullerService.cs
Show resolved
Hide resolved
this.AcknowledgeEvents(cancellationToken), | ||
this.ReleaseEvents(cancellationToken), | ||
this.RejectEvents(cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting ;)
Description of changes
Using Task.WhenAll() will wait until every task is in a completed state before returning. This is problematic when a single subscription crashes, as we would normally expect the exception to bubble up to the IHost and crash it, thus enabling the entire app to be restarted, but it's not happening as long as at least one subscription hasn't crashed out.
This replaces Task.WhenAll() with a system that immediately bubbles up the first exception encountered.
Breaking changes
In theory, a pod that was running this background service would survive a crash in a subscription as long as any other subscription didn't crash. With this change, the pod is expected to crash entirely, enabling k8s to detect the broken state and restart it.
Additional checks