-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2907] Queue config processing log spew #1009
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1009 +/- ##
==========================================
+ Coverage 82.48% 82.50% +0.02%
==========================================
Files 97 97
Lines 15627 15637 +10
==========================================
+ Hits 12890 12902 +12
+ Misses 2457 2456 -1
+ Partials 280 279 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kousei47747 thanks for this patch. There is one small question remaining. Please take a look.
pkg/scheduler/objects/queue.go
Outdated
return queue, err | ||
} | ||
|
||
// NewConfiguredShadowQueue creates a new queue from scratch based on the configuration and logs at debug level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kousei47747, thank you for your contribution!
Could you provide more details about the differences between NewConfiguredQueue
and NewConfiguredShadowQueue
?
Currently, it appears the only difference is the log level, which may be confusing for those unfamiliar with shadow queues.
It would be helpful to explain when developers should use NewConfiguredQueue
versus NewConfiguredShadowQueue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blueBlue0102 thanks for reviewing! I will provide more detailed description in the new patch.
I dropped a large comment into the jira to provide some guidance and point out another issue. |
@wilfred-s thanks for reviewing! I've replied on jira, please take a look. |
I’ve updated the PR based on the guidance and noticed that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. I'm not sure about the current approach. But if you keep it, please make sure it's tested properly. PartitionContext
does not have the best unit test coverage (at least direct coverage), but new code should definitely be covered.
If we need to silence all logs when creating shadow queues, it seems that some places were missed. For example:
It seems that either adding an additional parameter or creating new functions adds quite a lot of complexity to the codebase, but I don't have a good idea for a third approach at the moment. |
@blueBlue0102 good catch! I've updated the PR to silence |
@@ -191,17 +199,18 @@ func (pc *PartitionContext) updatePartitionDetails(conf configs.PartitionConfig) | |||
return ugm.GetUserManager().UpdateConfig(queueConf, conf.Queues[0].Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this one? This is still called regardless of the value of silence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! updatePartitionDetails
is called only when performing the real update, so I think we can skip silence
here.
yunikorn-core/pkg/scheduler/context.go
Lines 372 to 374 in 34ca3b2
// checks passed perform the real update | |
log.Log(log.SchedContext).Info("updating partitions", zap.String("partitionName", partitionName)) | |
err = part.updatePartitionDetails(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patches! It seems there are still some places that were missed. It made me think of another approach.
pkg/common/security/acl.go
Outdated
@@ -116,7 +119,7 @@ func NewACL(aclStr string) (ACL, error) { | |||
// trim and check for wildcard | |||
acl.setAllAllowed(aclStr) | |||
// parse users and groups | |||
acl.setUsers(strings.Split(fields[0], common.Separator)) | |||
acl.setUsers(strings.Split(fields[0], common.Separator), silence) | |||
if len(fields) == 2 { | |||
acl.setGroups(strings.Split(fields[1], common.Separator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still want to use the silence flag approach, acl.setGroups
also needs to be silenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence flag added to setGroups, thanks!
@@ -79,7 +79,7 @@ type PartitionContext struct { | |||
locking.RWMutex | |||
} | |||
|
|||
func newPartitionContext(conf configs.PartitionConfig, rmID string, cc *ClusterContext) (*PartitionContext, error) { | |||
func newPartitionContext(conf configs.PartitionConfig, rmID string, cc *ClusterContext, silence bool) (*PartitionContext, error) { | |||
if conf.Name == "" || rmID == "" { | |||
log.Log(log.SchedPartition).Info("partition cannot be created", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still want to use the silence flag approach, we should check silence before logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is logged only when there's a misconfiguration and will return instantly with an error, so no further logs will be generated. I think we can keep it for debugging. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the log for misconfiguration make sense to me.
// make sure the new info passes all checks | ||
_, err = newPartitionContext(p, rmID, nil) | ||
_, err = newPartitionContext(p, rmID, nil, true) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we call newPartitionContext
here only for validating configs.PartitionConfig
.
If my understanding is correct, the main purpose of this JIRA is to avoid any side effects when validating the config before updatePartitionDetails
,
not only:
- Logging
- Sending out events
but also including:
- Loading the user manager and updating user settings.
ugm.GetUserManager().UpdateConfig()
would be called inupdatePartitionDetails()
when validation passes. - etc.
So rather than adding a lot of flag parameters to each underlying function, extracting all the validation logic from newPartitionContext
into another function or adding those validation logic into updatePartitionDetails
might be a better approach.
Let newPartitionContext
do what it needs to do without consideration of "silence". What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice approach! It looks like the trade-off would be duplicate logic/methods when extracting it from stacked callers. I tried a similar approach in the first few commits, but it turned out to be a bit cumbersome. So I think I still prefer adding the silence flag.
Please let me know if I’m missing something!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a confirmation.
return ugm.GetUserManager().UpdateConfig(queueConf, conf.Queues[0].Name) | ||
if !silence { | ||
return ugm.GetUserManager().UpdateConfig(queueConf, conf.Queues[0].Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilfred-s This change also fix the unnecessarily ugm reload
issue which you mentioned in Jira.
@kousei47747 I think we don't need another Jira. Does it sound right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenyulin0719 yes, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, lgtm
What is this PR for?
Silence the logs of shadow queue structure during configuration update.
What type of PR is it?
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-2907