Skip to content

Commit

Permalink
#370 Client rework Blog (#371)
Browse files Browse the repository at this point in the history
* #370 Client rework Blog

---------

Co-authored-by: Tristan Tarrant <[email protected]>
  • Loading branch information
wburns and tristantarrant authored Nov 26, 2024
1 parent c30fc45 commit 83ab865
Showing 1 changed file with 178 additions and 0 deletions.
178 changes: 178 additions & 0 deletions _posts/2024-11-26-client-rework.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
layout: blog
title: Infinispan Java Hot Rod client pool rework
permalink: /blog/:year/:month/:day/hotrod-client-pool-rework
date: '2024-11-26:00:00.000-00:00'
author: wburns
tags: [ "client", "hotrod", "performance" ]
---

Infinispan 15.1 will be shipping a new default Hot Rod client implementation.
This implementation completely overhauls the "pool" implementation and adds many
internal code optimizations and reductions.

### An overview of the changes

* Remove `ChannelPool` implementation, replaced with single pipelined `Channel`-per-server
* Reduce per-operation allocation rate
* Remove unnecessary allocations during response processing
* New protocol (HR 4.1) supporting streaming API in a connection stateless way
* Internal refactoring to simplify adding additional commands
* Multiple fixes for client bloom filters
* Rework client flags to be more consistent and not tied to thread locals
* Drop client support for HR protocol versions older than 3.0
* New `hotrod-client-legacy` jar for the old client

### How fast is it though?

As you can see this is quite an extensive rework and I am guessing many of you want to know just
"How fast is it?". Lets test it and find out.

Using the following https://github.com/infinispan/infinispan-benchmarks/tree/main/getputremovetest[JMH benchmark]
we found that in each case the new client has better performance.

|===
| Clients | Servers | Concurrency | Performance Difference |

| 1 | 1 | single | +11.5% |
| 1 | 1 | high | +7% |
| 1 | 3 | single | +2.5% |
| 3 | 3 | high | +10% |

|===

From this table you should expect a performance benefit in the majority of cases while also reaping the other benefits
listed below.

#### What does *pipelined* `Channel` mean though?

In the previous client, we would keep a pool of connections and, for each concurrent operation, we would allocate the
operations required and then submit the bytes to the server in a single socket and wait until the bytes were flushed
to the socket. During that period another thread could not use the same socket, thus it would use another.

The new client, however, uses pipelined requests so that multiple requests can be sent on the same socket without flsuhing immediately.
Flushing is only performed after the concurrent requests are sent. This means, if multiple threads all send an
operation, we can keep those requests possibly in a single packet when being sent to the server instead of one per-request.

This has the possibility of a loss in performance in a very specific case: a single client instance and a single
server on hardware with a lot of cores. This is due to the use of an event loop in both the client and the server: operations
to a server from the same client are always sent on a dedicated thread and the server proceses responses on a dedicated thread per
client as well. As the number of clients and servers are scaled up, though, this concern dissolves very quickly and, depending on your
hardware, may not be a concern at all as given the numbers we have above.

### File Descriptors

What about resource usage during the test? As mentioned above the client now only uses a single connection
per server instead of a pool per server.

Using a single server, after everything has been initialized, we can see we are using 35 file descriptors.

```
perf@perf:~$ lsof -p <pid> -i | wc -l
35
```

While running with the legacy client we see the following (note this is filtering only on the HR port)

```
perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
45
```

So in this case we have 45 files opened when running the test, which is more than the server just idle!
This makes sense though given we have 1 file for the LISTEN for connections on the server and 22 each for the
client and server (as our test is running 22 concurrent threads).


In comparison when using the new client we only have 3 file descriptors! (note this filtering only on the HR port)

```
perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
3
```

That is one for the LISTEN for the server and the single connection between the client to the server.

In this run we _also_ saw a 5.8% increase from the new client, pretty great!

This should help out all users, as we have had some cases in the past where users were using hundreds of clients
and dozens of servers, causing for almost 100K+ connections. With this change in place the most number of client
connections will be capped at the number of clients times the number of servers, completely eliminating the
effect of concurrency on the number of client connections.

### Client memory usage

The per operation allocation rates have been reduced as well, thus not requiring client applications to have
as much resource dedicated to the GC there.

In the above test the legacy client allocation rate was around 660 MB/s where as the new client was only 350 MB/s:
almost half the allocation rate!
As you can expect in test test we saw half the number of GC runs between the legacy and new client.

The biggest reason for this is because of our simplified internal operations and other miscellaneous per operation things.
Just as a simple measure, you can see how much fewer objects are required in the constructor for our operations.


Legacy PutOperation
```java
public PutOperation(Codec codec, ChannelFactory channelFactory,
Object key, byte[] keyBytes, byte[] cacheName, AtomicReference<ClientTopology> clientTopology,
int flags, Configuration cfg, byte[] value, long lifespan, TimeUnit lifespanTimeUnit,
long maxIdle, TimeUnit maxIdleTimeUnit, DataFormat dataFormat, ClientStatistics clientStatistics,
TelemetryService telemetryService) {
super(PUT_REQUEST, PUT_RESPONSE, codec, channelFactory, key, keyBytes, cacheName, clientTopology,
flags, cfg, value, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit, dataFormat, clientStatistics,
telemetryService);
}
```

New PutOperation
```java
public PutOperation(InternalRemoteCache<?, ?> cache, byte[] keyBytes, byte[] valueBytes, long lifespan,
TimeUnit lifespanTimeUnit, long maxIdle, TimeUnit maxIdleTimeUnit) {
super(cache, keyBytes, valueBytes, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit);
}
```

### New Hot Rod protocol 4.1 and Streaming commands

Some of you may have been using the https://docs.jboss.org/infinispan/15.1/apidocs/org/infinispan/client/hotrod/StreamingRemoteCache.html[streaming remote API].
Don't worry this API has not changed. Instead the underlying operations were needed to be updated. For those of you not familiar this is a way
to use a streamed based approach to read and write byte[] values to the remote cache, allowing the client to only have to have a portion of value in
memory at a given time.

The problem is the underlying operations were implemented in a way where it would reserve a connection while the read or write operation was performed.
This is problematic with our current single connection per server approach in the client. Instead Hot Rod protocol 4.1 implements new "stateless" commands
that send/receive chunks of the value bytes as they are read/received with a non blocking operation underneath. The `OutputStream|InputStream` instances
will still block waiting for the underlying socket to complete its operations, but with the change to the protocol it no longer requires reserving the
socket to the server.

Initial performance tests show a small to no change in performance which is well within what we would hope for. Please test it out if you are using it
and let us know!

### Client Hot Rod flags

Many of you may not be aware, but when you applied a `Flag` to an operation on the `RemoteCache` instance, you would have to set for
_every_ operation and if you shared the `RemoteCache` instance between threads they were independent. This embedded `Cache` instance
behaved in a different fashion saving the Flag between operations and was the same between threads if using the same instance.

The RemoteCache behavior while being error prone due to above was also detrimental to performance as you would need additional allocations
per operation. As such in 15.1.0 the Flag instances are now stored in the RemoteCache instance and only need to be set once. If applied
more than once the same instance is returned to the user to reduce allocation rates.

Note this change is for both the new client and the legacy client referred to in the next section.

### Legacy Client

The new client, due to how it works, cannot support older Hot Rod protocols and as such it does not support anything older than
protocol 3.0. The 3.0 protocol was introduce in Infinispan 10.0, which was released over 5 years ago.
The protocol definitions can be found https://infinispan.org/docs/dev/titles/hotrod_protocol/hotrod_protocol.html[here] for reference.

Due to this, and the complete overhaul of the internals we are providing a _legacy_ module available which will use the previous client
which supports back to HotRod 2.0. This can be used by just changing the module dependency from `hotrod-client` to `hotrod-client-legacy`.


### Conclusions

We hope you all get a chance to try out the client changes and see what benefits or issues you find with the new client.
If you want to discuss this please feel free to reach out to us as can be seen at https://infinispan.org/community/.

0 comments on commit 83ab865

Please sign in to comment.