From 83ab8651c0a1476a6c090d1b955b1e87e0da0593 Mon Sep 17 00:00:00 2001 From: William Burns Date: Tue, 26 Nov 2024 07:56:27 -0800 Subject: [PATCH] #370 Client rework Blog (#371) * #370 Client rework Blog --------- Co-authored-by: Tristan Tarrant --- _posts/2024-11-26-client-rework.adoc | 178 +++++++++++++++++++++++++++ 1 file changed, 178 insertions(+) create mode 100644 _posts/2024-11-26-client-rework.adoc diff --git a/_posts/2024-11-26-client-rework.adoc b/_posts/2024-11-26-client-rework.adoc new file mode 100644 index 0000000000..b9d6eae021 --- /dev/null +++ b/_posts/2024-11-26-client-rework.adoc @@ -0,0 +1,178 @@ +--- +layout: blog +title: Infinispan Java Hot Rod client pool rework +permalink: /blog/:year/:month/:day/hotrod-client-pool-rework +date: '2024-11-26:00:00.000-00:00' +author: wburns +tags: [ "client", "hotrod", "performance" ] +--- + +Infinispan 15.1 will be shipping a new default Hot Rod client implementation. +This implementation completely overhauls the "pool" implementation and adds many +internal code optimizations and reductions. + +### An overview of the changes + +* Remove `ChannelPool` implementation, replaced with single pipelined `Channel`-per-server +* Reduce per-operation allocation rate +* Remove unnecessary allocations during response processing +* New protocol (HR 4.1) supporting streaming API in a connection stateless way +* Internal refactoring to simplify adding additional commands +* Multiple fixes for client bloom filters +* Rework client flags to be more consistent and not tied to thread locals +* Drop client support for HR protocol versions older than 3.0 +* New `hotrod-client-legacy` jar for the old client + +### How fast is it though? + +As you can see this is quite an extensive rework and I am guessing many of you want to know just +"How fast is it?". Lets test it and find out. + +Using the following https://github.com/infinispan/infinispan-benchmarks/tree/main/getputremovetest[JMH benchmark] +we found that in each case the new client has better performance. + +|=== +| Clients | Servers | Concurrency | Performance Difference | + +| 1 | 1 | single | +11.5% | +| 1 | 1 | high | +7% | +| 1 | 3 | single | +2.5% | +| 3 | 3 | high | +10% | + +|=== + +From this table you should expect a performance benefit in the majority of cases while also reaping the other benefits +listed below. + +#### What does *pipelined* `Channel` mean though? + +In the previous client, we would keep a pool of connections and, for each concurrent operation, we would allocate the +operations required and then submit the bytes to the server in a single socket and wait until the bytes were flushed +to the socket. During that period another thread could not use the same socket, thus it would use another. + +The new client, however, uses pipelined requests so that multiple requests can be sent on the same socket without flsuhing immediately. +Flushing is only performed after the concurrent requests are sent. This means, if multiple threads all send an +operation, we can keep those requests possibly in a single packet when being sent to the server instead of one per-request. + +This has the possibility of a loss in performance in a very specific case: a single client instance and a single +server on hardware with a lot of cores. This is due to the use of an event loop in both the client and the server: operations +to a server from the same client are always sent on a dedicated thread and the server proceses responses on a dedicated thread per +client as well. As the number of clients and servers are scaled up, though, this concern dissolves very quickly and, depending on your +hardware, may not be a concern at all as given the numbers we have above. + +### File Descriptors + +What about resource usage during the test? As mentioned above the client now only uses a single connection +per server instead of a pool per server. + +Using a single server, after everything has been initialized, we can see we are using 35 file descriptors. + +``` +perf@perf:~$ lsof -p -i | wc -l +35 +``` + +While running with the legacy client we see the following (note this is filtering only on the HR port) + +``` +perf@perf:~$ lsof -p -i | grep 11222 | wc -l +45 +``` + +So in this case we have 45 files opened when running the test, which is more than the server just idle! +This makes sense though given we have 1 file for the LISTEN for connections on the server and 22 each for the +client and server (as our test is running 22 concurrent threads). + + +In comparison when using the new client we only have 3 file descriptors! (note this filtering only on the HR port) + +``` +perf@perf:~$ lsof -p -i | grep 11222 | wc -l +3 +``` + +That is one for the LISTEN for the server and the single connection between the client to the server. + +In this run we _also_ saw a 5.8% increase from the new client, pretty great! + +This should help out all users, as we have had some cases in the past where users were using hundreds of clients +and dozens of servers, causing for almost 100K+ connections. With this change in place the most number of client +connections will be capped at the number of clients times the number of servers, completely eliminating the +effect of concurrency on the number of client connections. + +### Client memory usage + +The per operation allocation rates have been reduced as well, thus not requiring client applications to have +as much resource dedicated to the GC there. + +In the above test the legacy client allocation rate was around 660 MB/s where as the new client was only 350 MB/s: +almost half the allocation rate! +As you can expect in test test we saw half the number of GC runs between the legacy and new client. + +The biggest reason for this is because of our simplified internal operations and other miscellaneous per operation things. +Just as a simple measure, you can see how much fewer objects are required in the constructor for our operations. + + +Legacy PutOperation +```java + public PutOperation(Codec codec, ChannelFactory channelFactory, + Object key, byte[] keyBytes, byte[] cacheName, AtomicReference clientTopology, + int flags, Configuration cfg, byte[] value, long lifespan, TimeUnit lifespanTimeUnit, + long maxIdle, TimeUnit maxIdleTimeUnit, DataFormat dataFormat, ClientStatistics clientStatistics, + TelemetryService telemetryService) { + super(PUT_REQUEST, PUT_RESPONSE, codec, channelFactory, key, keyBytes, cacheName, clientTopology, + flags, cfg, value, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit, dataFormat, clientStatistics, + telemetryService); + } +``` + +New PutOperation +```java + public PutOperation(InternalRemoteCache cache, byte[] keyBytes, byte[] valueBytes, long lifespan, + TimeUnit lifespanTimeUnit, long maxIdle, TimeUnit maxIdleTimeUnit) { + super(cache, keyBytes, valueBytes, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit); + } +``` + +### New Hot Rod protocol 4.1 and Streaming commands + +Some of you may have been using the https://docs.jboss.org/infinispan/15.1/apidocs/org/infinispan/client/hotrod/StreamingRemoteCache.html[streaming remote API]. +Don't worry this API has not changed. Instead the underlying operations were needed to be updated. For those of you not familiar this is a way +to use a streamed based approach to read and write byte[] values to the remote cache, allowing the client to only have to have a portion of value in +memory at a given time. + +The problem is the underlying operations were implemented in a way where it would reserve a connection while the read or write operation was performed. +This is problematic with our current single connection per server approach in the client. Instead Hot Rod protocol 4.1 implements new "stateless" commands +that send/receive chunks of the value bytes as they are read/received with a non blocking operation underneath. The `OutputStream|InputStream` instances +will still block waiting for the underlying socket to complete its operations, but with the change to the protocol it no longer requires reserving the +socket to the server. + +Initial performance tests show a small to no change in performance which is well within what we would hope for. Please test it out if you are using it +and let us know! + +### Client Hot Rod flags + +Many of you may not be aware, but when you applied a `Flag` to an operation on the `RemoteCache` instance, you would have to set for +_every_ operation and if you shared the `RemoteCache` instance between threads they were independent. This embedded `Cache` instance +behaved in a different fashion saving the Flag between operations and was the same between threads if using the same instance. + +The RemoteCache behavior while being error prone due to above was also detrimental to performance as you would need additional allocations +per operation. As such in 15.1.0 the Flag instances are now stored in the RemoteCache instance and only need to be set once. If applied +more than once the same instance is returned to the user to reduce allocation rates. + +Note this change is for both the new client and the legacy client referred to in the next section. + +### Legacy Client + +The new client, due to how it works, cannot support older Hot Rod protocols and as such it does not support anything older than +protocol 3.0. The 3.0 protocol was introduce in Infinispan 10.0, which was released over 5 years ago. +The protocol definitions can be found https://infinispan.org/docs/dev/titles/hotrod_protocol/hotrod_protocol.html[here] for reference. + +Due to this, and the complete overhaul of the internals we are providing a _legacy_ module available which will use the previous client +which supports back to HotRod 2.0. This can be used by just changing the module dependency from `hotrod-client` to `hotrod-client-legacy`. + + +### Conclusions + +We hope you all get a chance to try out the client changes and see what benefits or issues you find with the new client. +If you want to discuss this please feel free to reach out to us as can be seen at https://infinispan.org/community/.