A76: Improvements to the Ring Hash LB Policy #412

atollena · 2024-01-22T08:29:11Z

Based on discussion in grpc/grpc#33356.

A76-ring-hash-improvements.md

murgatroid99 · 2024-01-22T17:49:21Z

A76-ring-hash-improvements.md

+`hash_key` name resolver attribute changes.
+
+The xDS resolver will be changed so that when converting EDS responses to
+resolver endpoints, it will set the `hash_key` name resolver attribute to the


EDS responses are handled in a load balancing policy, so I'm not sure if "name resolver attribute" is the right terminology here.

OK, I interpreted your comment as two things:

It's not the xDS resolver, but the cluster resolver LB policy that does this, so I replaced the term.

"name resolver attributes" are not really a thing. I replaced this with "Endpoint attributes", which is metadata attached to each endpoint, used for example to store locality IDs and weight, and which IIUC are language specific, but all languages have this feature. I don't know if this is clear enough for cross-language interpretations -- please let me know.

1 seems to be in motion with A74, which removes the cluster resolver LB policy in favor of the "xDS resolver".

murgatroid99 · 2024-01-22T18:02:56Z

A76-ring-hash-improvements.md

+ uint64 min_ring_size = 1; // Optional, defaults to 1024.
+ uint64 max_ring_size = 2; // Optional, defaults to 4096, max is 8M.
+
+ string request_metadata_key = 3; // Optional, defaults to the empty string.


In the context of a Protobuf 3 message, an unset string field is the same as an empty string value, so stating that the field "defaults to the empty string" is redundant with the default Protobuf behavior. In addition, "defaults to the empty string" seems to imply that the empty string value will be used in some way, but earlier it says that "not set" and "empty" are treated the same. I think it would be clearer to say something like "Optional, unused if unset".

Yes, replaced with "Optional, unused if unset.". I also removed mentions of "not set" in the implementation section, since there is no way to distinguish not set from empty.

murgatroid99 · 2024-01-22T18:08:13Z

A76-ring-hash-improvements.md

+
+After the addition of this field, the `ring_hash` LB policy config will be:
+
+ message RingHashLoadBalancingConfig {


To get protobuf syntax highlighting, instead of indenting the code, use a fenced code block, with the language name proto.

murgatroid99 · 2024-01-22T18:12:53Z

A76-ring-hash-improvements.md

+`GRPC_EXPERIMENTAL_XDS_RING_HASH_ENDPOINT_HASH_KEY` environment variable. This
+will protect from the case where an xDS control plane is already setting the
+`LbEndpoint.Metadata` `envoy.lb` `hash_key` field, in which case deploying this
+new behavior would churn all endpoint hash keys. This environment variable will


Deploying a new version of gRPC will clear all ring_hash state anyway, so I don't understand how this environment variable would do anything to prevent churn.

Deploying with the environment variable unset will keep the current behavior of hashing the IP address instead of the new behavior of using the value from LbEndpoint.Metadata. As a result, the locations of existing endpoints on the ring will be guaranteed to be the same as before, and requests will continue being routed to the same endpoint. It will churn connections, if that's what you mean, but not endpoint locations on the ring. Suddenly changing all locations on the ring may have unintended consequences, removing the locality that ring hash may have been used for in the first place.

I'm thinking of a better phrasing, and the best I could think of was to replace "churn" with "change". Let me know if this is still unclear.

markdroth

Thanks for writing this!

Overall, the design looks good, but I think there is one significant issue related to where we compute the request hash, since I don't think it will work to do it in the picker. That will probably need a bit more discussion to resolve.

Please let me know if you have any questions. Thanks!

A76-ring-hash-improvements.md

markdroth

Thanks for making those updates! We'll get back to you on the open question once Eric gets back.

A76-ring-hash-improvements.md

atollena · 2024-03-22T09:30:20Z

@markdroth just checking in on this, I've made the updates regarding handling the empty header. I'll start testing an implementation for this internally, but in the meantime would appreciate another review. Thanks!

markdroth

Sorry for the delay! This looks really good overall. Comments are mostly things to improve clarity of the doc.

Please let me know if you have any questions. Thanks!

A76-ring-hash-improvements.md

markdroth · 2024-03-22T23:27:46Z

A76-ring-hash-improvements.md

+
+This proposal extends the following existing gRFC:
+
+* [gRFC A42: xDS Ring Hash LB Policy][A42]


It's probably also worth referencing A61, which significantly simplified the ring_hash picker logic.

OK, thanks for pointing this out, I wasn't aware of A61's impact on ring hash. I spent some time to understand how that affects the implementation I started in Go. Go implements A62 (pick first sticky transient failure) but not A61, so it cannot take advantage of this without further refactoring the ring hash policy. I'll sync up with the Go team to see what the best path forward is, but I imagine it'd be best to start with implementing at least part of A61 (delegating to pick_first).

After a quick discussion with @dfawley I decided to try to implement this before A61. I think the proposal could be more clear in this case, at least for Go, because there is a difference between immediately attempting to connect (which happens only for subchannels in "idle" state), and queuing a connect for subchannels in transient failure, where a connection attempt will be triggered when the subchannel returns to idle state after the connection backoff.

I implemented the new behavior of scanning forward for a ready subchannel, either when:

the subchannel picked randomly is idle (immediate connection attempt triggered) or,

the subchannel picked randomly is in transient failure (connection attempt queued)

I think it has the desired behaviour of making sure we don't trigger more than one connection attempt on each pick if there is a ready subchannel, not adding latency if there is a ready subchannel, and converge to random. But I also considered only implementing the scanning forward for ready subchannel when we triggered an immediate connection attempt, which happens when either:

the picked subchannel is idle

the picked subchannel is in transient failure, and the second is idle

the picked suchannel is in transient failure, the second is also in transient failure, and we found an idle connection before a ready one when scanning the ring forward.

This ambiguity will disappear when all implementations have A61 implemented. This is planned for this quarter for Go, IIUC, but I imagine there may be reasons for some implementations to want to implement A76 before A61. My question is whether you think it's worth adding a note about it in this gRFC to lift this ambiguity.

My inclination is that this logic is already complicated enough to understand, so I'd prefer to avoid muddying the waters by trying to do this without having first implemented the relevant parts of A61. From first glance, what you describe here seems reasonable, but I think I'd have to stare at it for a lot longer to convince myself it didn't have any potential pitfalls.

@dfawley, is it not feasible to implement just the ring_hash part of A61 before implementing this change?

We discussed with the go team and we're going to implement delegating to pick first before this change.

A76-ring-hash-improvements.md

markdroth

This looks great to me!

@ejona86 @dfawley Would you two please do a review pass as well? Thanks!

A76-ring-hash-improvements.md

ejona86 · 2024-07-18T22:01:30Z

A76-ring-hash-improvements.md

+
+Explicitly setting the request hash key cannot possibly lead to problem with
+existing deployment because the new behavior requires setting a load balancing
+policy configuration field that did not exist before. Therefore, it is not gated


That's not the condition. It's "whether remote I/O can trigger it" and it can. The reason to have an environment variable here is to handle bugs in implementations while they are being implemented. If there's a crasher bug, for example, you don't want to prevent using the new feature because you might trigger old clients to crash.

OK, so I guess for this one it also makes sense to have it be opt-in to start with. I added an environment variable condition called GRPC_EXPERIMENTAL_RING_HASH_SET_REQUEST_HASH_KEY. That would let users disable the feature should it cause problems if the source of the service config is causing an undesirable behavior. I am not familiar with the process of when we decide to remove this kind of feature gating environment variable. Please let me know if that seems right.

The normal process is, "when it is adequately tested, enable it by default." C++ core tends to remove the env var at that point. In Java/Go we tend to wait a release with the env var there so it can be disabled if something goes awry. When the env var is removed doesn't matter; we care about when the default changes here.

But the main point is "when tested." In most other gRFC's that's our interop test framework; here, I guess it'll just be unit tests? For Go in practice it might be "when someone tests it with a service" where "someone" is "Antoine and friends".

Again, the main thing we want to avoid is a bug during the initial implementation that causes really bad client behavior such that you can't enable the feature in the future, lest you trigger broken clients.

OK, that makes sense, and it also applies to the other environment variable then.

I'm happy to remove the environment variables when we have usage (we have internal usage of a fork of the existing ring hash that we'll be able to replace when this is implemented in Java and Go).

atollena · 2024-07-25T13:32:21Z

A61-IPv4-IPv6-dualstack-backends.md

@@ -545,6 +545,7 @@ for (i = 0; i < ring.size(); ++i) {
 return PICK_QUEUE;
 }
 }
+return PICK_FAIL;


I took the liberty to add this clarification to A62 -- handle the case of transient failure. Although I presume that the aggregated state reported to the parent may prevent the case from ever happening?

There are definitely times we'll pick from failing policies, like if all alternatives are also failing.

Implementation note: in Java we do return ring[first_index].picker->Pick(...);, which seems appropriate for everyone to do. And we can do the same in the new flow; both loops seem to guarantee if we get to the end that all endpoints are TF.

C-core does essentially the same, attaching additional info: https://github.com/grpc/grpc/blob/c02437a92f44d1aa69d6dfc3fa9f6e4ce1eaa151/src/core/load_balancing/ring_hash/ring_hash.cc#L355-L357

ejona86 · 2024-07-25T14:29:26Z

A76-ring-hash-improvements.md

+ // ...
+}
+
+requested_connection = picker_has_a_child_connecting;


We should describe that picker_has_a_child_connecting is computed when the picker is created.

dfawley · 2024-07-25T22:15:56Z

A76-ring-hash-improvements.md

+// Determine request hash.
+using_random_hash = false;
+if (config.request_hash_header.empty()) {
+ request_hash = call_attributes.hash;


Can't this also be empty? In which case you'd use a random hash, too?

Yes, I think the behavior is a bit confusing as it it, since if request_hash_header is empty, and the call attribute is not set because there is no xDS config selector to set it to a random value, then this would result in a fix hash that always routes to the same endpoint on the ring. I updated the logic and the text to pick a random hash in this case.

dfawley · 2024-07-25T22:19:34Z

A76-ring-hash-improvements.md

+determining the locations of each endpoint on the ring will be extracted from a
+pre-defined endpoint attribute called `hash_key`. If this attribute is set, then
+the endpoint is placed on the ring by hashing its value. If this attribute is
+not set or empty, then the endpoint's first address is hashed, matching the


"first address" -- So the A61 gRFC says the DNS resolver should do RFC-6724 sorting. But the endpoints aren't necessarily coming from the DNS resolver, right? So do we want to apply the same sorting here anyway, to ensure consistency?

But the endpoints aren't necessarily coming from the DNS resolver, right? So do we want to apply the same sorting here anyway, to ensure consistency?

The DNS resolver will create an endpoint per address. The fact that it sorts them seems irrelevant, since we do not take the order of endpoints into consideration when placing them on the ring. I would expect other resolvers that cannot group endpoints to operate like the DNS resolver, and resolvers that can group endpoints to sort addresses within an endpoint in a consistent way (such as according to RFC-6724). To me the wording seems sufficient here: users that care about this consistency should either sort addresses in the resolver, or use the endpoint metadata key.

You do raise an interesting point which is that if the DNS resolver returns multiple addresses for the same endpoint, there is no way for ring hash to associate them to the same endpoint. So each address, regardless of family, will get its own place on the ring. Some addresses may be multi-home of the same endpoint, etc. That seems like a fundamental limitation of using DNS A/AAAA records for service discovery: we can't do things that rely on knowing the list of endpoints rather than the list of addresses. And I think this concern also applies to other LB policies, such as RR and WRR, which when used with DNS, would cause multiple connections to the same endpoint, potentially resulting in load imbalance.

A76-ring-hash-improvements.md

A76: Improvements to the Ring Hash LB Policy

e558476

This was referenced Jan 22, 2024

Support using the ring hash balancer without xDS grpc/grpc#33356

Open

A76: Improvements to the Ring Hash LB Policy atollena/proposal#1

Closed

Comments from Sergey

5407438

atollena force-pushed the A76 branch from a1d2530 to 5407438 Compare January 22, 2024 13:52

markdroth self-assigned this Jan 22, 2024

murgatroid99 reviewed Jan 22, 2024

View reviewed changes

comments from Michael

cc3af7f

markdroth reviewed Jan 22, 2024

View reviewed changes

feedback from Mark

01449d7

markdroth reviewed Jan 23, 2024

View reviewed changes

A76-ring-hash-improvements.md Outdated Show resolved Hide resolved

better handling of empty hash header

cf65a14

markdroth reviewed Mar 22, 2024

View reviewed changes

feedback from Mark

202a225

markdroth reviewed Apr 9, 2024

View reviewed changes

s-matyukevich mentioned this pull request Apr 11, 2024

Random subsetting with rendezvous hashing LB policy s-matyukevich/proposal#2

Closed

ginayeh requested review from ejona86 and dfawley April 23, 2024 16:14

This was referenced Apr 23, 2024

ringhash: make endpoint hash key configurable, and set it from EDS grpc/grpc-go#7161

Closed

ringhash: allow setting request hash key explicitly grpc/grpc-go#7170

Closed

add implementation link

0de42f2

ejona86 reviewed Jul 18, 2024

View reviewed changes

atollena added 2 commits July 22, 2024 09:40

Merge branch 'master' into A76

a9db1d1

feedback from Eric

3458c0a

atollena mentioned this pull request Jul 25, 2024

feature request: ring_hash as a general load balancer grpc/grpc-java#8513

Open

feedback from eric & mark

4120933

atollena commented Jul 25, 2024

View reviewed changes

ejona86 approved these changes Jul 25, 2024

View reviewed changes

state out how we know that at least one endpoint is connecting

44ef7d3

dfawley reviewed Jul 25, 2024

View reviewed changes

comments from doug + use first pick on TF

7edaaad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A76: Improvements to the Ring Hash LB Policy #412

A76: Improvements to the Ring Hash LB Policy #412

atollena commented Jan 22, 2024 •

edited by markdroth

Loading

murgatroid99 Jan 22, 2024

atollena Jan 22, 2024 •

edited

Loading

murgatroid99 Jan 22, 2024

atollena Jan 22, 2024 •

edited

Loading

murgatroid99 Jan 22, 2024

atollena Jan 22, 2024

murgatroid99 Jan 22, 2024

atollena Jan 22, 2024

markdroth left a comment

markdroth left a comment

atollena commented Mar 22, 2024

markdroth left a comment

markdroth Mar 22, 2024

atollena Apr 2, 2024 •

edited

Loading

atollena Apr 29, 2024

markdroth May 6, 2024

atollena May 28, 2024

markdroth left a comment

ejona86 Jul 18, 2024

atollena Jul 22, 2024 •

edited

Loading

ejona86 Jul 25, 2024

atollena Jul 26, 2024

atollena Jul 25, 2024

ejona86 Jul 25, 2024

atollena Jul 25, 2024 •

edited

Loading

ejona86 Jul 25, 2024

atollena Jul 25, 2024

dfawley Jul 25, 2024

atollena Jul 26, 2024

dfawley Jul 25, 2024

atollena Jul 26, 2024


		After the addition of this field, the `ring_hash` LB policy config will be:

		message RingHashLoadBalancingConfig {


		This proposal extends the following existing gRFC:

		* [gRFC A42: xDS Ring Hash LB Policy][A42]

A76: Improvements to the Ring Hash LB Policy #412

Are you sure you want to change the base?

A76: Improvements to the Ring Hash LB Policy #412

Conversation

atollena commented Jan 22, 2024 • edited by markdroth Loading

Choose a reason for hiding this comment

atollena Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atollena Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markdroth left a comment

Choose a reason for hiding this comment

markdroth left a comment

Choose a reason for hiding this comment

atollena commented Mar 22, 2024

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atollena Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atollena Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atollena Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atollena commented Jan 22, 2024 •

edited by markdroth

Loading

atollena Jan 22, 2024 •

edited

Loading

atollena Jan 22, 2024 •

edited

Loading

atollena Apr 2, 2024 •

edited

Loading

atollena Jul 22, 2024 •

edited

Loading

atollena Jul 25, 2024 •

edited

Loading