-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add a design for a priority queue #3013
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
designs/priorityqueue.md
Outdated
// Adding an item that is already there may update its wait | ||
// period to the lowest of existing and new wait period or | ||
// its priority to the highest of existing and new priority. | ||
AddWithOpts(o AddOpts, items ...T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main alternate to this would be to have AddWithPriority
, AddAfterWithPriority
AddRatelimitedWithPriority
etc - I found this easier but not a strong opinion either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a lot easier to extend without breaking changes
ea46cac
to
f988aa0
Compare
reasons to prioritize some events") will always require implementation of a custom | ||
handler or eventsource in order to inject the appropriate priority. | ||
|
||
## Implementation stages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the default controller be modified to make use of this new queue (if used), or will it rely on using a custom controller implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will be updated to make use of it - the only thing it needs is to re-use the priority though. Once we make it the default, I would also like to add a Priority parameter to the reconcile.Result
- But really only once its enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also to the Request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also to the Request?
Ah, interesting thought. Is the idea because you want to be able to set the priority in a handler?
The way this currently works is that the workqueue doesn't have any understanding of the request object (and we should keep it that way IMHO). We could probably provide a thin wrapper for it that will use the priority from the request if the AddWithOptions
call doesn't have one yet and inject that in the builder if its typed to reconcile.Request
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the priority could also be useful information for the Reconcile func. But maybe it's a bad idea not sure 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Because then it can use that as an input when returning a priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one use case. Another one would be to reconcile Requests of different priorities differently :)
Maybe it makes sense in some cases to pass down the priority (if some other components are involved in reconciliation).
Maybe a controller would act differently if it can infer based on the priority if this is just a periodic resync (or something similar) vs an actual change. Or in general prioritize Requests with higher priority higher (if it has to schedule some "tasks" in other systems)
But I'm really not sure if this is a good idea or opening pandora's box :)
// object is already in the workqueue, the priority will be updated | ||
// to the highest of existing and new priority. | ||
Priority int | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m wondering if we need to consider starvation scenarios whether some item is never retrieved because there’s always something of higher priority in the queue. It might be fair to say this is “user error”, but is there a way to detect / avoid / recover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point. I don't think avoid or recover, but maybe detect. I'll think about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a bit about this. This can only happen if the controller is unable to drain its q, which would be a problem by itself regardless of priority q. I would generally expect alerts on this.
The implementation of the queue I have in the other PR happens to have timestamps for when an object was added stored so in theory we could use that to emit metrics or logs if an object is in the queue for too long. In practise however, defining what "too long" is seems difficult because it varies. I tend to think that the queue depth metric is overall good enough to deal with this problem. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queue depth is definitely good enough in general to detect that the queue cannot be drained. Isn't there also a metric for the longest time something is in the queue?
Independent of that we could also consider increasing priority after each resync, so it eventually reaches or maybe even exceeds (not sure) the priority of regular events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent of that we could also consider increasing priority after each resync
I wouldn't want something like that in the queue (or probably even by default), because that seems a bit too magic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm. Looking forward to using this 😀
designs/priorityqueue.md
Outdated
In order to fix the issue described in point one of the motivation section, | ||
we have to be able to differentiate events stemming from the initial list | ||
during startup and from resyncs from other events. In both these cases, the | ||
informer emits an artifical create. The suggestion is to use a heuristic that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resyncs are updates (without resourceVersion changes)
Maybe we could make this better mid-term by flagging these events explicitly in client-go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resyncs are updates (without resourceVersion changes)
You are right, fascinating, TILd. Will update
Maybe we could make this better mid-term by flagging these events explicitly in client-go
That would make sense but I think it might be difficult due to how client-go is factored (But would love to be wrong on this).
reasons to prioritize some events") will always require implementation of a custom | ||
handler or eventsource in order to inject the appropriate priority. | ||
|
||
## Implementation stages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also to the Request?
// GetWithPriority returns an item and its priority. It allows | ||
// a controller to re-use the priority if it enqueues an item | ||
// again. | ||
GetWithPriority() (item T, priority int, shutdown bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the story for the shutdown return parameter? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied it from the existing Get
, we use it to exit the controller:
controller-runtime/pkg/internal/controller/controller.go
Lines 263 to 265 in e3347b5
if shutdown { | |
// Stop working | |
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Probably worth extending the godoc of this func to explain it a bit (~ the same comment as on Typed.Get)
This change describes the motivation and implementation details for a priority queue in controller-runtime.
f988aa0
to
bd6eede
Compare
This change describes the motivation and implementation details for a priority queue in controller-runtime.
Ref #2374
POC of the changes is in #3014