-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* #370 Client rework Blog --------- Co-authored-by: Tristan Tarrant <[email protected]>
- Loading branch information
1 parent
c30fc45
commit 83ab865
Showing
1 changed file
with
178 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
--- | ||
layout: blog | ||
title: Infinispan Java Hot Rod client pool rework | ||
permalink: /blog/:year/:month/:day/hotrod-client-pool-rework | ||
date: '2024-11-26:00:00.000-00:00' | ||
author: wburns | ||
tags: [ "client", "hotrod", "performance" ] | ||
--- | ||
|
||
Infinispan 15.1 will be shipping a new default Hot Rod client implementation. | ||
This implementation completely overhauls the "pool" implementation and adds many | ||
internal code optimizations and reductions. | ||
|
||
### An overview of the changes | ||
|
||
* Remove `ChannelPool` implementation, replaced with single pipelined `Channel`-per-server | ||
* Reduce per-operation allocation rate | ||
* Remove unnecessary allocations during response processing | ||
* New protocol (HR 4.1) supporting streaming API in a connection stateless way | ||
* Internal refactoring to simplify adding additional commands | ||
* Multiple fixes for client bloom filters | ||
* Rework client flags to be more consistent and not tied to thread locals | ||
* Drop client support for HR protocol versions older than 3.0 | ||
* New `hotrod-client-legacy` jar for the old client | ||
|
||
### How fast is it though? | ||
|
||
As you can see this is quite an extensive rework and I am guessing many of you want to know just | ||
"How fast is it?". Lets test it and find out. | ||
|
||
Using the following https://github.com/infinispan/infinispan-benchmarks/tree/main/getputremovetest[JMH benchmark] | ||
we found that in each case the new client has better performance. | ||
|
||
|=== | ||
| Clients | Servers | Concurrency | Performance Difference | | ||
|
||
| 1 | 1 | single | +11.5% | | ||
| 1 | 1 | high | +7% | | ||
| 1 | 3 | single | +2.5% | | ||
| 3 | 3 | high | +10% | | ||
|
||
|=== | ||
|
||
From this table you should expect a performance benefit in the majority of cases while also reaping the other benefits | ||
listed below. | ||
|
||
#### What does *pipelined* `Channel` mean though? | ||
|
||
In the previous client, we would keep a pool of connections and, for each concurrent operation, we would allocate the | ||
operations required and then submit the bytes to the server in a single socket and wait until the bytes were flushed | ||
to the socket. During that period another thread could not use the same socket, thus it would use another. | ||
|
||
The new client, however, uses pipelined requests so that multiple requests can be sent on the same socket without flsuhing immediately. | ||
Flushing is only performed after the concurrent requests are sent. This means, if multiple threads all send an | ||
operation, we can keep those requests possibly in a single packet when being sent to the server instead of one per-request. | ||
|
||
This has the possibility of a loss in performance in a very specific case: a single client instance and a single | ||
server on hardware with a lot of cores. This is due to the use of an event loop in both the client and the server: operations | ||
to a server from the same client are always sent on a dedicated thread and the server proceses responses on a dedicated thread per | ||
client as well. As the number of clients and servers are scaled up, though, this concern dissolves very quickly and, depending on your | ||
hardware, may not be a concern at all as given the numbers we have above. | ||
|
||
### File Descriptors | ||
|
||
What about resource usage during the test? As mentioned above the client now only uses a single connection | ||
per server instead of a pool per server. | ||
|
||
Using a single server, after everything has been initialized, we can see we are using 35 file descriptors. | ||
|
||
``` | ||
perf@perf:~$ lsof -p <pid> -i | wc -l | ||
35 | ||
``` | ||
|
||
While running with the legacy client we see the following (note this is filtering only on the HR port) | ||
|
||
``` | ||
perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l | ||
45 | ||
``` | ||
|
||
So in this case we have 45 files opened when running the test, which is more than the server just idle! | ||
This makes sense though given we have 1 file for the LISTEN for connections on the server and 22 each for the | ||
client and server (as our test is running 22 concurrent threads). | ||
|
||
|
||
In comparison when using the new client we only have 3 file descriptors! (note this filtering only on the HR port) | ||
|
||
``` | ||
perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l | ||
3 | ||
``` | ||
|
||
That is one for the LISTEN for the server and the single connection between the client to the server. | ||
|
||
In this run we _also_ saw a 5.8% increase from the new client, pretty great! | ||
|
||
This should help out all users, as we have had some cases in the past where users were using hundreds of clients | ||
and dozens of servers, causing for almost 100K+ connections. With this change in place the most number of client | ||
connections will be capped at the number of clients times the number of servers, completely eliminating the | ||
effect of concurrency on the number of client connections. | ||
|
||
### Client memory usage | ||
|
||
The per operation allocation rates have been reduced as well, thus not requiring client applications to have | ||
as much resource dedicated to the GC there. | ||
|
||
In the above test the legacy client allocation rate was around 660 MB/s where as the new client was only 350 MB/s: | ||
almost half the allocation rate! | ||
As you can expect in test test we saw half the number of GC runs between the legacy and new client. | ||
|
||
The biggest reason for this is because of our simplified internal operations and other miscellaneous per operation things. | ||
Just as a simple measure, you can see how much fewer objects are required in the constructor for our operations. | ||
|
||
|
||
Legacy PutOperation | ||
```java | ||
public PutOperation(Codec codec, ChannelFactory channelFactory, | ||
Object key, byte[] keyBytes, byte[] cacheName, AtomicReference<ClientTopology> clientTopology, | ||
int flags, Configuration cfg, byte[] value, long lifespan, TimeUnit lifespanTimeUnit, | ||
long maxIdle, TimeUnit maxIdleTimeUnit, DataFormat dataFormat, ClientStatistics clientStatistics, | ||
TelemetryService telemetryService) { | ||
super(PUT_REQUEST, PUT_RESPONSE, codec, channelFactory, key, keyBytes, cacheName, clientTopology, | ||
flags, cfg, value, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit, dataFormat, clientStatistics, | ||
telemetryService); | ||
} | ||
``` | ||
|
||
New PutOperation | ||
```java | ||
public PutOperation(InternalRemoteCache<?, ?> cache, byte[] keyBytes, byte[] valueBytes, long lifespan, | ||
TimeUnit lifespanTimeUnit, long maxIdle, TimeUnit maxIdleTimeUnit) { | ||
super(cache, keyBytes, valueBytes, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit); | ||
} | ||
``` | ||
|
||
### New Hot Rod protocol 4.1 and Streaming commands | ||
|
||
Some of you may have been using the https://docs.jboss.org/infinispan/15.1/apidocs/org/infinispan/client/hotrod/StreamingRemoteCache.html[streaming remote API]. | ||
Don't worry this API has not changed. Instead the underlying operations were needed to be updated. For those of you not familiar this is a way | ||
to use a streamed based approach to read and write byte[] values to the remote cache, allowing the client to only have to have a portion of value in | ||
memory at a given time. | ||
|
||
The problem is the underlying operations were implemented in a way where it would reserve a connection while the read or write operation was performed. | ||
This is problematic with our current single connection per server approach in the client. Instead Hot Rod protocol 4.1 implements new "stateless" commands | ||
that send/receive chunks of the value bytes as they are read/received with a non blocking operation underneath. The `OutputStream|InputStream` instances | ||
will still block waiting for the underlying socket to complete its operations, but with the change to the protocol it no longer requires reserving the | ||
socket to the server. | ||
|
||
Initial performance tests show a small to no change in performance which is well within what we would hope for. Please test it out if you are using it | ||
and let us know! | ||
|
||
### Client Hot Rod flags | ||
|
||
Many of you may not be aware, but when you applied a `Flag` to an operation on the `RemoteCache` instance, you would have to set for | ||
_every_ operation and if you shared the `RemoteCache` instance between threads they were independent. This embedded `Cache` instance | ||
behaved in a different fashion saving the Flag between operations and was the same between threads if using the same instance. | ||
|
||
The RemoteCache behavior while being error prone due to above was also detrimental to performance as you would need additional allocations | ||
per operation. As such in 15.1.0 the Flag instances are now stored in the RemoteCache instance and only need to be set once. If applied | ||
more than once the same instance is returned to the user to reduce allocation rates. | ||
|
||
Note this change is for both the new client and the legacy client referred to in the next section. | ||
|
||
### Legacy Client | ||
|
||
The new client, due to how it works, cannot support older Hot Rod protocols and as such it does not support anything older than | ||
protocol 3.0. The 3.0 protocol was introduce in Infinispan 10.0, which was released over 5 years ago. | ||
The protocol definitions can be found https://infinispan.org/docs/dev/titles/hotrod_protocol/hotrod_protocol.html[here] for reference. | ||
|
||
Due to this, and the complete overhaul of the internals we are providing a _legacy_ module available which will use the previous client | ||
which supports back to HotRod 2.0. This can be used by just changing the module dependency from `hotrod-client` to `hotrod-client-legacy`. | ||
|
||
|
||
### Conclusions | ||
|
||
We hope you all get a chance to try out the client changes and see what benefits or issues you find with the new client. | ||
If you want to discuss this please feel free to reach out to us as can be seen at https://infinispan.org/community/. |