Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves start up performance with lb templates #4379

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

trozet
Copy link
Contributor

@trozet trozet commented May 21, 2024

Follow up from #4270

During initial node sync, we were creating all of the lb templates over and over again in OVN. Service handler starts after, and will take care of adding missing LB templates while the node cache is already hot. There is no point in doing it during node info sync on start up.

It is still required to do when nodes are added or removed post-start.

@trozet trozet requested a review from a team as a code owner May 21, 2024 19:14
@trozet
Copy link
Contributor Author

trozet commented May 22, 2024

hit #4381

@tssurya tssurya added feature/services&endpoints All issues related to the Servces/Endpoints API area/scale&performance All issues related to scale & performance labels May 22, 2024
We were not waiting on for informers to fully shutdown for level-driven
controllers. As a consequence, goroutines would still be running
handling events while the next unit test case ran. In this case, a
handler with no workqueue (like node_tracker) would attempt to read
global config, while the next test case was modifying the global config,
leading to a data race.

Fixes: ovn-org#4381

Signed-off-by: Tim Rozet <[email protected]>
@trozet
Copy link
Contributor Author

trozet commented May 22, 2024

Rebasing on #4383 to see if that makes this unit test CI green

Follow up from ovn-org#4270

During initial node sync, we were creating all of the lb templates over
and over again in OVN. Service handler starts after, and will take care
of adding missing LB templates while the node cache is already hot.
There is no point in doing it during node info sync on start up.

It is still required to do when nodes are added or removed post-start.

Signed-off-by: Tim Rozet <[email protected]>
@trozet trozet force-pushed the perf_improvement_lb_templates branch from efcd6c4 to f1f8ef1 Compare May 22, 2024 21:51
// Stop stops the factory informers, and waits for their handlers to stop
func (wf *WatchFactory) Stop() {
klog.Info("Stopping watch factory")
wf.iFactory.Shutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for others: for the shutdowns here, stop channels must be closed prior to calling shutdown, and that is accomplished before Stop() func being called.

c.nodeIPv6Templates.AsTemplateMap(),
}
if err := svcCreateOrUpdateTemplateVar(c.nbClient, nodeIPTemplates); err != nil {
klog.Errorf("Could not sync node IP templates")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the error message in the log

if err := svcCreateOrUpdateTemplateVar(c.nbClient, nodeIPTemplates); err != nil {
klog.Errorf("Could not sync node IP templates")
return
c.startupDoneLock.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for others - this will not block for long. Another controller only locks to write to a bool startupDone and unlocks immediately after.

klog.Errorf("Could not sync node IP templates")
return
c.startupDoneLock.RLock()
defer c.startupDoneLock.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to keep hold of the lock while processing svcCreateOrUpdateTemplateVar below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scale&performance All issues related to scale & performance feature/services&endpoints All issues related to the Servces/Endpoints API
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

None yet

3 participants