#370 Client rework Blog (#371)

* #370 Client rework Blog --------- Co-authored-by: Tristan Tarrant <[email protected]>
infinispan · Nov 26, 2024 · 83ab865 · 83ab865
1 parent c30fc45
commit 83ab865
Showing 1 changed file with 178 additions and 0 deletions.
diff --git a/_posts/2024-11-26-client-rework.adoc b/_posts/2024-11-26-client-rework.adoc
@@ -0,0 +1,178 @@
+---
+layout: blog
+title: Infinispan Java Hot Rod client pool rework
+permalink: /blog/:year/:month/:day/hotrod-client-pool-rework
+date: '2024-11-26:00:00.000-00:00'
+author: wburns
+tags: [ "client", "hotrod", "performance" ]
+---
+
+Infinispan 15.1 will be shipping a new default Hot Rod client implementation.
+This implementation completely overhauls the "pool" implementation and adds many
+internal code optimizations and reductions.
+
+### An overview of the changes
+
+* Remove `ChannelPool` implementation, replaced with single pipelined `Channel`-per-server
+* Reduce per-operation allocation rate
+* Remove unnecessary allocations during response processing
+* New protocol (HR 4.1) supporting streaming API in a connection stateless way
+* Internal refactoring to simplify adding additional commands
+* Multiple fixes for client bloom filters
+* Rework client flags to be more consistent and not tied to thread locals
+* Drop client support for HR protocol versions older than 3.0
+* New `hotrod-client-legacy` jar for the old client
+
+### How fast is it though?
+
+As you can see this is quite an extensive rework and I am guessing many of you want to know just
+"How fast is it?". Lets test it and find out.
+
+Using the following https://github.com/infinispan/infinispan-benchmarks/tree/main/getputremovetest[JMH benchmark]
+we found that in each case the new client has better performance.
+
+|===
+| Clients | Servers | Concurrency | Performance Difference |
+
+| 1 | 1 | single | +11.5% |
+| 1 | 1 | high | +7% |
+| 1 | 3 | single | +2.5% |
+| 3 | 3 | high | +10% |
+
+|===
+
+From this table you should expect a performance benefit in the majority of cases while also reaping the other benefits
+listed below.
+
+#### What does *pipelined* `Channel` mean though?
+
+In the previous client, we would keep a pool of connections and, for each concurrent operation, we would allocate the
+operations required and then submit the bytes to the server in a single socket and wait until the bytes were flushed
+to the socket. During that period another thread could not use the same socket, thus it would use another.
+
+The new client, however, uses pipelined requests so that multiple requests can be sent on the same socket without flsuhing immediately.
+Flushing is only performed after the concurrent requests are sent. This means, if multiple threads all send an
+operation, we can keep those requests possibly in a single packet when being sent to the server instead of one per-request.
+
+This has the possibility of a loss in performance in a very specific case: a single client instance and a single
+server on hardware with a lot of cores. This is due to the use of an event loop in both the client and the server: operations
+to a server from the same client are always sent on a dedicated thread and the server proceses responses on a dedicated thread per
+client as well. As the number of clients and servers are scaled up, though, this concern dissolves very quickly and, depending on your
+hardware, may not be a concern at all as given the numbers we have above.
+
+### File Descriptors
+
+What about resource usage during the test? As mentioned above the client now only uses a single connection
+per server instead of a pool per server.
+
+Using a single server, after everything has been initialized, we can see we are using 35 file descriptors.
+
+```
+perf@perf:~$ lsof -p <pid> -i | wc -l
+35
+```
+
+While running with the legacy client we see the following (note this is filtering only on the HR port)
+
+```
+perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
+45
+```
+
+So in this case we have 45 files opened when running the test, which is more than the server just idle!
+This makes sense though given we have 1 file for the LISTEN for connections on the server and 22 each for the
+client and server (as our test is running 22 concurrent threads).
+
+
+In comparison when using the new client we only have 3 file descriptors! (note this filtering only on the HR port)
+
+```
+perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
+3
+```
+
+That is one for the LISTEN for the server and the single connection between the client to the server.
+
+In this run we _also_ saw a 5.8% increase from the new client, pretty great!
+
+This should help out all users, as we have had some cases in the past where users were using hundreds of clients
+and dozens of servers, causing for almost 100K+ connections. With this change in place the most number of client
+connections will be capped at the number of clients times the number of servers, completely eliminating the
+effect of concurrency on the number of client connections.
+
+### Client memory usage
+
+The per operation allocation rates have been reduced as well, thus not requiring client applications to have
+as much resource dedicated to the GC there.
+
+In the above test the legacy client allocation rate was around 660 MB/s where as the new client was only 350 MB/s:
+almost half the allocation rate!
+As you can expect in test test we saw half the number of GC runs between the legacy and new client.
+
+The biggest reason for this is because of our simplified internal operations and other miscellaneous per operation things.
+Just as a simple measure, you can see how much fewer objects are required in the constructor for our operations.
+
+
+Legacy PutOperation
+```java
+   public PutOperation(Codec codec, ChannelFactory channelFactory,
+                       Object key, byte[] keyBytes, byte[] cacheName, AtomicReference<ClientTopology> clientTopology,
+                       int flags, Configuration cfg, byte[] value, long lifespan, TimeUnit lifespanTimeUnit,
+                       long maxIdle, TimeUnit maxIdleTimeUnit, DataFormat dataFormat, ClientStatistics clientStatistics,
+                       TelemetryService telemetryService) {
+      super(PUT_REQUEST, PUT_RESPONSE, codec, channelFactory, key, keyBytes, cacheName, clientTopology,
+            flags, cfg, value, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit, dataFormat, clientStatistics,
+            telemetryService);
+   }
+```
+
+New PutOperation
+```java
+   public PutOperation(InternalRemoteCache<?, ?> cache, byte[] keyBytes, byte[] valueBytes, long lifespan,
+                       TimeUnit lifespanTimeUnit, long maxIdle, TimeUnit maxIdleTimeUnit) {
+      super(cache, keyBytes, valueBytes, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit);
+   }
+```
+
+### New Hot Rod protocol 4.1 and Streaming commands
+
+Some of you may have been using the https://docs.jboss.org/infinispan/15.1/apidocs/org/infinispan/client/hotrod/StreamingRemoteCache.html[streaming remote API].
+Don't worry this API has not changed. Instead the underlying operations were needed to be updated. For those of you not familiar this is a way
+to use a streamed based approach to read and write byte[] values to the remote cache, allowing the client to only have to have a portion of value in
+memory at a given time.
+
+The problem is the underlying operations were implemented in a way where it would reserve a connection while the read or write operation was performed.
+This is problematic with our current single connection per server approach in the client. Instead Hot Rod protocol 4.1 implements new "stateless" commands
+that send/receive chunks of the value bytes as they are read/received with a non blocking operation underneath. The `OutputStream|InputStream` instances
+will still block waiting for the underlying socket to complete its operations, but with the change to the protocol it no longer requires reserving the
+socket to the server.
+
+Initial performance tests show a small to no change in performance which is well within what we would hope for. Please test it out if you are using it
+and let us know!
+
+### Client Hot Rod flags
+
+Many of you may not be aware, but when you applied a `Flag` to an operation on the `RemoteCache` instance, you would have to set for 
+_every_ operation and if you shared the `RemoteCache` instance between threads they were independent. This embedded `Cache` instance
+behaved in a different fashion saving the Flag between operations and was the same between threads if using the same instance.
+
+The RemoteCache behavior while being error prone due to above was also detrimental to performance as you would need additional allocations
+per operation. As such in 15.1.0 the Flag instances are now stored in the RemoteCache instance and only need to be set once. If applied
+more than once the same instance is returned to the user to reduce allocation rates.
+
+Note this change is for both the new client and the legacy client referred to in the next section.
+
+### Legacy Client
+
+The new client, due to how it works, cannot support older Hot Rod protocols and as such it does not support anything older than
+protocol 3.0. The 3.0 protocol was introduce in Infinispan 10.0, which was released over 5 years ago.
+The protocol definitions can be found https://infinispan.org/docs/dev/titles/hotrod_protocol/hotrod_protocol.html[here] for reference.
+
+Due to this, and the complete overhaul of the internals we are providing a _legacy_ module available which will use the previous client
+which supports back to HotRod 2.0. This can be used by just changing the module dependency from `hotrod-client` to `hotrod-client-legacy`.
+
+
+### Conclusions
+
+We hope you all get a chance to try out the client changes and see what benefits or issues you find with the new client.
+If you want to discuss this please feel free to reach out to us as can be seen at https://infinispan.org/community/.