Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor service load balancer to support different strategies #10534

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ifross89
Copy link

What does this PR do?

This change refactors the Weighted Round Robin load balancer to support different load balancing strategies.

Motivation

Currently the only load balancing strategy available within traefik is (weighted) round robin. While this is a good strategy in general, it can be problematic if one or more of the backends is getting overloaded.

There are many load balancing strategies, but the
"power-of-two-random-choices" (p2c) algorithm has some good properties that make it a good general use algorithm.

Specifically some of the benefits include:

  • reducing the total number of requests to slower
  • constant time backend picking (in the general
  • reduced "herd behaviour" compared to e.g. "least connections" load balancing.

The algorithm works by taking two backends at random, and choosing the backend that has the fewest in-flight requests. In order to do this, we have to track the number of in-flight requests each backend is currently processing.

The aim of this change is to demonstrate that this new load balancing strategy can be added with minimal changes, and reusing a lot of the existing load balancing code by factoring out the explicit strategy into an interface.

In order to do this, the wrr package was removed, and the existing LoadBalancer was moved to the parent directory: the loadbalancer package.

There are many strategies that can be used for load balancing, many of which require "extrinsic" information, such as the CPU load on the backends. This change is not meant to open the door for adding such strategies, but attempts to add an effective load balancing strategy with low cost to the codebase.

This change does not integrate the new strategy into the rest of traefik: there would need to be more tests added, updates to documentation, and perhaps some investigation into performance / optimisations. I am willing to do this work, but didn't want to spend too much time on this if the change is not going to be accepted, so would like a design review.

As for how this would be specified, as discussed in the linked issue, I propose adding a strategy under loadBalancer, e.g.:

services:
    Service01:
      loadBalancer:
        strategy: wrr
        servers:
          - url: foobar
            weight: 42
          - url: foobar
            weight: 42
    Service02:
      loadBalancer:
        strategy: p2c
        servers:
          - url: foobar
          - url: foobar

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

There is initial discussion about this change here:

#6985

A private PR with more changes starting to integrate this into the traefik can be found here:

ifross89#2

@ifross89 ifross89 changed the base branch from master to v3.0 March 19, 2024 18:30
@ifross89 ifross89 changed the base branch from v3.0 to master March 19, 2024 18:30
This is a change to add the "power of two random choices" load
balancing strategy to the service's load balancer.

This change does not actually start using the new strategy, and is
backwards incompatable, but intended to start a design-review. To fully
add support, many tests will need to be updated, so this change is meant
to highlight the core changes to support the new strategy.

Currently the only load balancing strategy available within traefik is
(weighted) round robin. While this is a good strategy in general, it can
be problematic if one or more of the backends is getting overloaded.

There are many load balancing strategies, but the
"power-of-two-random-choices" algorithm has some good properties that
make it a good general use algorithm.

Specifically some of the benefits include:
 - reducing the total number of requests to slower backends
 - constant time backend picking (in the general case)
 - reduced "herd behaviour" compared to e.g. "least connections" load
   balancing.

The algorithm works by taking two backends at random, and choosing the
backend that has the fewest in-flight requests. In order to do this, we
have to track the number of in-flight requests each backend is currently
processing.

The aim of this change is to demonstrate that this new load balancing
strategy can be added with minimal changes, and reusing a lot of the
existing load balancing code by factoring out the explicit strategy into
an interface.

In order to do this, the wrr package was removed, and the existing
LoadBalancer was moved to the parent directory: the loadbalancer
package.

There are many strategies that can be used for load balancing, many of
which require "extrinsic" information, such as the CPU load on the
backends. This change is not meant to open the door for adding such
strategies, but attempts to add an effective load balancing strategy
with low cost to the codebase.
@ifross89
Copy link
Author

ifross89 commented Apr 3, 2024

Hi, just wondering if there are any timelines around getting this reviewed? Do let me know if there is anything I can do to help the process. Thanks

= added 3 commits April 3, 2024 12:11
- change 'behavior' spelling from UK -> US
- remove unused variable lint error by returning early
- use strconv.Itoa not fmt.Sprint for perf reasons
@ifross89
Copy link
Author

ifross89 commented Apr 3, 2024

There seem to be some conflicting lint rules in the pipeline:

$ golangci-lint run
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)

pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)

        "math/rand/v2"

$ golangci-lint run --fix
$ golangci-lint run      
pkg/server/service/loadbalancer/strategy_test.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        "math/rand/v2"
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        "testing"
pkg/server/service/loadbalancer/p2c.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        crand "crypto/rand"

$ golangci-lint run --fix
$ golangci-lint run      
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)

pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)

        "math/rand/v2"

It is probably because:

  • the gci tool does not seem to think the math/rand/v2 library is in the stdlib, probably as it is new
  • the `gci tool does not group standard lib packages that have an alias.

So either there can be changes to the linting in the pipeline or the code can be refactored to not have both rand packages imported in the same file. I can take advice from a maintainer about which is more appropriate, but it shouldn't affect the bulk of the review.

@ldez
Copy link
Member

ldez commented Apr 3, 2024

the problem will be fixed when the mergeback of v2.11 will happen.
FYI, it's a bug inside a linter, it's already fixed inside golangci-lint.

The expected import blocks are:

import (
	"math/rand/v2"
	"strconv"
	"testing"
)
import (
	crand "crypto/rand"
	"math/rand/v2"
)

This will fail the gci check due to a known issue.
}

// NewP2C creates a "the power-of-two-random-choices" algorithm for load
// balancing. The idea of this is two take two of the backends at random from

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

}

// strategyPowerOfTwoChoices implements "the power-of-two-random-choices" algorithm for load balancing.
// The idea of this is two take two of the backends at random from the available backends, and select

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

}

func (s *strategyPowerOfTwoChoices) nextServer(map[string]struct{}) *namedHandler {
if len(s.healthy) == 1 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is called 4 times in this func. Is there any optimization that can be done by using a variable or it is preferable to get the latest value? (I'm not familar with how golang compilation/runtime works).

@jeflefou
Copy link

Very interesting PR. This will be a very valuable addition.
I hope maintainers will make the design review soon.

@ifross89
Copy link
Author

Hi, do you have any opinion on this? Happy to close the PR if it isn't a direction you want to go in. Merci

@rtribotte
Copy link
Member

Hello @ifross89,

Sorry for not getting back to you sooner, but I'll try to give you feedback on the changes this week.
Thanks anyway for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants