Performance improvements #415

ryanemerson · 2025-02-05T10:09:23Z

Closes #413
Closes #414

tristantarrant · 2025-02-05T10:38:37Z

For reference it would be nice to have some benchmarks between old and new implementations

pruivo

I don't think you are going to like my suggestion.

I suggest to create a new org.infinispan.protostream.impl.TagWriterImpl.Encoder with the "LazyByteArrayOutputStream". I think all the methods would benefit from it and avoid doing instanceOf in the writeUTF8Field method.

For example, the method is invoking ensureCapacity 8 times while we can do better

      void writeFixed64(long value) throws IOException {
         out.write((byte) (value & 0xFF));
         out.write((byte) ((value >> 8) & 0xFF));
         out.write((byte) ((value >> 16) & 0xFF));
         out.write((byte) ((value >> 24) & 0xFF));
         out.write((byte) ((int) (value >> 32) & 0xFF));
         out.write((byte) ((int) (value >> 40) & 0xFF));
         out.write((byte) ((int) (value >> 48) & 0xFF));
         out.write((byte) ((int) (value >> 56) & 0xFF));
      }

Even the default impl of the Encoder class can benefit

      void writeFixed64Field(int fieldNumber, long value) throws IOException {
         writeVarint32(WireType.makeTag(fieldNumber, WireType.WIRETYPE_FIXED64));
         writeFixed64(value);
      }

^ for example, ensure 13 bytes capacity, write directly to the buffer, and update the position at the end.

core/src/main/java/org/infinispan/protostream/LazyByteArrayOutputStream.java

pruivo · 2025-02-05T11:07:01Z

core/src/main/java/org/infinispan/protostream/impl/ByteArrayOutputStreamEx.java

 */
+@Deprecated(forRemoval = true)


seeing this in the impl package, I'm tempted to say "remove it".
Are we planning to backport this PR/optimizations for ProtoStream 5.x? We can deprecate it there and remove it here.

Are we planning to backport this PR/optimizations for ProtoStream 5.x? We can deprecate it there and remove it here.

I'm not sure. My gut feeling would be that it's not worth backporting, given that the main driver is the migration to protostream and the fact we only want new features on main.

+1 to deprecating in 5.x.

core/src/main/java/org/infinispan/protostream/impl/LazyByteArrayOutputStream.java

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

core/src/test/java/org/infinispan/protostream/impl/TagWriterImplTest.java

ryanemerson · 2025-02-05T11:54:03Z

For reference it would be nice to have some benchmarks between old and new implementations

You're right! Below are the figures I posted on Zulip the other day which tests UTF8 marshalling in isolation:

Benchmark                         (initialArraySize)  (initialPosition)  (stringLength)      (type)  (useMultiByte)   Mode  Cnt         Score         Error  Units
UtfBenchmark.testUtfWrite                         64                  0               8    proto-ex           false  thrpt    5  17293297.695 ±  775157.948  ops/s
UtfBenchmark.testUtfWrite                         64                  0               8  proto-lazy           false  thrpt    5  28974585.735 ±  144449.312  ops/s

proto-ex is the old code, proto-lazy new.

I will re-run the numbers once I have looked into Pedro's suggestions.

wburns · 2025-02-05T22:23:38Z

core/src/main/java/org/infinispan/protostream/LazyByteArrayOutputStream.java

+   int getPosition();
+   void setPosition(int position);
+   void ensureCapacity(int size);
+   byte[] getRawBuffer();


Is there any reason we aren't just doing write(int pos, byte val) instead? This would allow for other implementations that aren't byte[] backed.

The original purpose of the interface was to allow different implementations of byte[] backed streams, so exposing byte[] directly removes the need for lots of additional #write methods calls.

I guess your suggestion would allow the interface to be used by other indexed OutputStream implementations though, e.g. one that is ByteBuf based. Would this be beneficial for the client?

Yes, this is what I am thinking. I think this should benefit both the client and possibly even the server with some more changes I am thinking about.

Interface and implementation updated to no longer expose a byte[]. Let me know if we're missing anything you think we'll need for your future ideas.

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

ryanemerson · 2025-02-10T13:12:04Z

Latest benchmark results just writing strings:

Benchmark                         (initialArraySize)  (initialPosition)  (stringLength)      (type)  (useMultiByte)   Mode  Cnt         Score         Error  Units
UtfBenchmark.testUtfWrite                         32                  0               8  proto-lazy            true  thrpt    5  19378946.697 ± 1008986.982  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8  proto-lazy           false  thrpt    5  35079258.012 ± 1634211.248  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8    proto-ex            true  thrpt    5  11728437.002 ±   42893.047  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8    proto-ex           false  thrpt    5  14566156.958 ±   65063.897  ops/s

wburns · 2025-02-10T19:57:25Z

core/src/main/java/org/infinispan/protostream/LazyByteArrayOutputStream.java

+   }
+
+   default byte[] toByteArray() {
+      return Arrays.copyOf(getRawBuffer(), getPosition());


This leads me to also wonder, if we were to support byte[] backed streams, this method would not work if there is an offset in that byte[]. Currently, we don't have that but it could be useful down the line if we did that. Another reason I don't think we should expose the byte[] if we can.

core/src/main/java/org/infinispan/protostream/impl/RawByteArrayOutputStreamImpl.java

pruivo · 2025-02-10T21:34:30Z

core/src/main/java/org/infinispan/protostream/impl/RawByteArrayOutputStreamImpl.java

+   int pos = 0;
+
+   public RawByteArrayOutputStreamImpl() {
+   }


shouldn't create a new a byte[] with default size (MIN_SIZE)? We could skip the null checks in ensureCapacityand getByteBuffer.

During my benchmarks it seem liked initializing the array was leading to worse performance. I believe this is because we're creating a lot of stream instances and in many cases the initial capacity required exceeded MIN_SIZE.

core/src/main/java/org/infinispan/protostream/impl/RawByteArrayOutputStreamImpl.java

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

pruivo · 2025-02-10T21:54:17Z

core/src/main/java/org/infinispan/protostream/impl/UnknownFieldSetImpl.java

@@ -198,7 +199,7 @@ public boolean hasTag(int tag) {

   @Override
   public void writeExternal(ObjectOutput out) throws IOException {
-      ByteArrayOutputStreamEx baos = new ByteArrayOutputStreamEx();
+      RawByteArrayOutputStream baos = new RawByteArrayOutputStreamImpl();
      TagWriter output = TagWriterImpl.newInstance(null, baos);
      writeTo(output);
      output.flush();


Lines 206-210 can be replaced with

out.writeInt(baos.getPosition()); out.write(baos.getRawBuffer(), 0, baos.getPosition());

Now that we're not exposing getRawBuffer this is no longer possible. Does it make more sense to wrap the provided ObjectOutput in an Adapter and pass this to TagWriterImpl#newInstance?

getRawBuffer is still a public method.
At the moment, I don't see how the adapter would work as you don't know the size beforehand.

Apologies for the confusion. I replied before pushing my changes. @wburns wanted to remove the getRawBuffer method in favour of utilising methods write(index, ...) methods so that we can provide a ByteBuf implementation on the client side.

At the moment, I don't see how the adapter would work as you don't know the size beforehand.

You're right. I hadn't really thought of the length field.

I have updated the code so that we just loop through the stream bytes and write them individually, so that we don't need to create a ByteBufffer implementation. I'm not sure if this is much better than what we already had though, so happy to revert.

ok, let me take another look 👍

ryanemerson · 2025-02-11T12:07:10Z

I have updated the interface to not expose the raw byte[] as suggested by @wburns so that we can utilise the stream for ByteBuf based solutions as well. Unfortunately it seems like there's a slight perf hit when compared with the values I posted yesterday:


Benchmark                         (initialArraySize)  (initialPosition)  (stringLength)      (type)  (useMultiByte)   Mode  Cnt         Score        Error  Units
UtfBenchmark.testUtfWrite                         32                  0               8  proto-lazy            true  thrpt    5  18189985.935 ± 708413.260  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8  proto-lazy           false  thrpt    5  34802151.363 ± 543860.030  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8    proto-ex            true  thrpt    5  11461152.933 ± 569242.019  ops/s
UtfBenchmark.testUtfWrite                         32                  0               8    proto-ex           false  thrpt    5  14165941.018 ±  54503.091  ops/s

pruivo

I suggested an "improvement" for UnknownFieldSetImpl. The remaining comments are optional.

core/src/main/java/org/infinispan/protostream/ProtobufUtil.java

core/src/main/java/org/infinispan/protostream/impl/RandomAccessOutputStreamImpl.java

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

core/src/main/java/org/infinispan/protostream/impl/UnknownFieldSetImpl.java

pruivo

LGTM! Nice work!

wburns

LGTM!

wburns · 2025-02-11T15:47:03Z

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

+
+      private static int varIntBytes(long value) {
+         int i = 1;
+         while ((value & 0xFFFFFFFFFFFFFF80L) != 0) {


I don't say we have to change now, but I wonder if the 4 ifs is faster than this where it checks less than 128, 16_384, etc. I can look at it later after it is integrated. It is probably will make no real measurable difference taking into account margin of error.

8 ifs, this has to support long too.

Sorry, it is only 4 ifs with an else. But I still think it would be better than a loop the JIT can't unroll.

how so? the max var int size is 10 bytes. How can you reduce that to 4 ifs?

Sorry, I forgot. I also meant we need another method. We have 18 references to this method and only 2 actually pass a long.

Wrote a quick JMH benchmark and for a single byte they have the same perf. Only once I increased it to something larger was there much of a difference and it still only a ns at worst (.22ns for the ifs and 1.13ns for the loop for 5 bytes). Note the ifs were the same perf no matter the size, guessing due to branch prediction and multiple branches evaluated concurrently in the CPU. Either way as I mentioned the difference is so minor it probably isn't worth it. Especially as the code here works for int and long and is much more concise.

Benchmark Mode Cnt Score Error Units VarIntSizeBenchmark.test268435456VarIntSizeIf avgt 10 0.222 ± 0.004 ns/op VarIntSizeBenchmark.test268435456VarIntSizeLoop avgt 10 1.133 ± 0.002 ns/op

- [infinispan#413] Optimize TagWriterImpl#writeString for ASCII only Strings - [infinispan#414] Replace ByteArrayOutputStreamEx with non-synchronized implementation

ryanemerson requested a review from a team as a code owner February 5, 2025 10:09

pruivo requested changes Feb 5, 2025

View reviewed changes

wburns reviewed Feb 5, 2025

View reviewed changes

wburns reviewed Feb 10, 2025

View reviewed changes

pruivo requested changes Feb 10, 2025

View reviewed changes

pruivo requested changes Feb 11, 2025

View reviewed changes

pruivo approved these changes Feb 11, 2025

View reviewed changes

wburns approved these changes Feb 11, 2025

View reviewed changes

Performance improvements

aad22eb

- [infinispan#413] Optimize TagWriterImpl#writeString for ASCII only Strings - [infinispan#414] Replace ByteArrayOutputStreamEx with non-synchronized implementation

ryanemerson force-pushed the UTF8_stream_main branch from 8d395e5 to aad22eb Compare February 11, 2025 15:57

wburns merged commit 3e133fe into infinispan:main Feb 11, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #415

Performance improvements #415

ryanemerson commented Feb 5, 2025

tristantarrant commented Feb 5, 2025

pruivo left a comment

pruivo Feb 5, 2025

ryanemerson Feb 5, 2025 •

edited

Loading

ryanemerson commented Feb 5, 2025

wburns Feb 5, 2025

ryanemerson Feb 10, 2025

wburns Feb 10, 2025

ryanemerson Feb 11, 2025

ryanemerson commented Feb 10, 2025

wburns Feb 10, 2025

pruivo Feb 10, 2025

ryanemerson Feb 11, 2025

pruivo Feb 10, 2025

ryanemerson Feb 11, 2025

pruivo Feb 11, 2025 •

edited

Loading

ryanemerson Feb 11, 2025

pruivo Feb 11, 2025

ryanemerson commented Feb 11, 2025 •

edited

Loading

pruivo left a comment

pruivo left a comment

wburns left a comment

wburns Feb 11, 2025

pruivo Feb 11, 2025

wburns Feb 11, 2025 •

edited

Loading

pruivo Feb 11, 2025

wburns Feb 11, 2025

wburns Feb 11, 2025 •

edited

Loading

		*/
		@Deprecated(forRemoval = true)

Performance improvements #415

Performance improvements #415

Conversation

ryanemerson commented Feb 5, 2025

tristantarrant commented Feb 5, 2025

pruivo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanemerson Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

ryanemerson commented Feb 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanemerson commented Feb 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pruivo Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanemerson commented Feb 11, 2025 • edited Loading

pruivo left a comment

Choose a reason for hiding this comment

pruivo left a comment

Choose a reason for hiding this comment

wburns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

ryanemerson Feb 5, 2025 •

edited

Loading

pruivo Feb 11, 2025 •

edited

Loading

ryanemerson commented Feb 11, 2025 •

edited

Loading

wburns Feb 11, 2025 •

edited

Loading

wburns Feb 11, 2025 •

edited

Loading