-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #415
Conversation
For reference it would be nice to have some benchmarks between old and new implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are going to like my suggestion.
I suggest to create a new org.infinispan.protostream.impl.TagWriterImpl.Encoder
with the "LazyByteArrayOutputStream". I think all the methods would benefit from it and avoid doing instanceOf
in the writeUTF8Field
method.
For example, the method is invoking ensureCapacity
8 times while we can do better
void writeFixed64(long value) throws IOException {
out.write((byte) (value & 0xFF));
out.write((byte) ((value >> 8) & 0xFF));
out.write((byte) ((value >> 16) & 0xFF));
out.write((byte) ((value >> 24) & 0xFF));
out.write((byte) ((int) (value >> 32) & 0xFF));
out.write((byte) ((int) (value >> 40) & 0xFF));
out.write((byte) ((int) (value >> 48) & 0xFF));
out.write((byte) ((int) (value >> 56) & 0xFF));
}
Even the default impl of the Encoder
class can benefit
void writeFixed64Field(int fieldNumber, long value) throws IOException {
writeVarint32(WireType.makeTag(fieldNumber, WireType.WIRETYPE_FIXED64));
writeFixed64(value);
}
^ for example, ensure 13 bytes capacity, write directly to the buffer, and update the position at the end.
core/src/main/java/org/infinispan/protostream/LazyByteArrayOutputStream.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Deprecated(forRemoval = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing this in the impl
package, I'm tempted to say "remove it".
Are we planning to backport this PR/optimizations for ProtoStream 5.x? We can deprecate it there and remove it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to backport this PR/optimizations for ProtoStream 5.x? We can deprecate it there and remove it here.
I'm not sure. My gut feeling would be that it's not worth backporting, given that the main driver is the migration to protostream and the fact we only want new features on main
.
+1 to deprecating in 5.x
.
core/src/main/java/org/infinispan/protostream/impl/LazyByteArrayOutputStream.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/LazyByteArrayOutputStream.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/infinispan/protostream/impl/TagWriterImplTest.java
Show resolved
Hide resolved
You're right! Below are the figures I posted on Zulip the other day which tests UTF8 marshalling in isolation:
I will re-run the numbers once I have looked into Pedro's suggestions. |
int getPosition(); | ||
void setPosition(int position); | ||
void ensureCapacity(int size); | ||
byte[] getRawBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we aren't just doing write(int pos, byte val)
instead? This would allow for other implementations that aren't byte[] backed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original purpose of the interface was to allow different implementations of byte[]
backed streams, so exposing byte[]
directly removes the need for lots of additional #write
methods calls.
I guess your suggestion would allow the interface to be used by other indexed OutputStream
implementations though, e.g. one that is ByteBuf based. Would this be beneficial for the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I am thinking. I think this should benefit both the client and possibly even the server with some more changes I am thinking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface and implementation updated to no longer expose a byte[]
. Let me know if we're missing anything you think we'll need for your future ideas.
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
Latest benchmark results just writing strings:
|
} | ||
|
||
default byte[] toByteArray() { | ||
return Arrays.copyOf(getRawBuffer(), getPosition()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads me to also wonder, if we were to support byte[] backed streams, this method would not work if there is an offset in that byte[]. Currently, we don't have that but it could be useful down the line if we did that. Another reason I don't think we should expose the byte[] if we can.
core/src/main/java/org/infinispan/protostream/impl/RawByteArrayOutputStreamImpl.java
Outdated
Show resolved
Hide resolved
int pos = 0; | ||
|
||
public RawByteArrayOutputStreamImpl() { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't create a new a byte[] with default size (MIN_SIZE
)? We could skip the null checks in ensureCapacity
and getByteBuffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During my benchmarks it seem liked initializing the array was leading to worse performance. I believe this is because we're creating a lot of stream instances and in many cases the initial capacity required exceeded MIN_SIZE
.
core/src/main/java/org/infinispan/protostream/impl/RawByteArrayOutputStreamImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
@@ -198,7 +199,7 @@ public boolean hasTag(int tag) { | |||
|
|||
@Override | |||
public void writeExternal(ObjectOutput out) throws IOException { | |||
ByteArrayOutputStreamEx baos = new ByteArrayOutputStreamEx(); | |||
RawByteArrayOutputStream baos = new RawByteArrayOutputStreamImpl(); | |||
TagWriter output = TagWriterImpl.newInstance(null, baos); | |||
writeTo(output); | |||
output.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 206-210 can be replaced with
out.writeInt(baos.getPosition());
out.write(baos.getRawBuffer(), 0, baos.getPosition());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're not exposing getRawBuffer
this is no longer possible. Does it make more sense to wrap the provided ObjectOutput
in an Adapter and pass this to TagWriterImpl#newInstance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRawBuffer
is still a public method.
At the moment, I don't see how the adapter would work as you don't know the size beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the confusion. I replied before pushing my changes. @wburns wanted to remove the getRawBuffer
method in favour of utilising methods write(index, ...)
methods so that we can provide a ByteBuf implementation on the client side.
At the moment, I don't see how the adapter would work as you don't know the size beforehand.
You're right. I hadn't really thought of the length field.
I have updated the code so that we just loop through the stream bytes and write them individually, so that we don't need to create a ByteBufffer implementation. I'm not sure if this is much better than what we already had though, so happy to revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me take another look 👍
I have updated the interface to not expose the raw
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested an "improvement" for UnknownFieldSetImpl
. The remaining comments are optional.
core/src/main/java/org/infinispan/protostream/ProtobufUtil.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/RandomAccessOutputStreamImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/infinispan/protostream/impl/UnknownFieldSetImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
private static int varIntBytes(long value) { | ||
int i = 1; | ||
while ((value & 0xFFFFFFFFFFFFFF80L) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't say we have to change now, but I wonder if the 4 ifs is faster than this where it checks less than 128, 16_384, etc. I can look at it later after it is integrated. It is probably will make no real measurable difference taking into account margin of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 ifs, this has to support long too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it is only 4 ifs with an else. But I still think it would be better than a loop the JIT can't unroll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how so? the max var int size is 10 bytes. How can you reduce that to 4 ifs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot. I also meant we need another method. We have 18 references to this method and only 2 actually pass a long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a quick JMH benchmark and for a single byte they have the same perf. Only once I increased it to something larger was there much of a difference and it still only a ns at worst (.22ns for the ifs and 1.13ns for the loop for 5 bytes). Note the ifs were the same perf no matter the size, guessing due to branch prediction and multiple branches evaluated concurrently in the CPU. Either way as I mentioned the difference is so minor it probably isn't worth it. Especially as the code here works for int and long and is much more concise.
Benchmark Mode Cnt Score Error Units
VarIntSizeBenchmark.test268435456VarIntSizeIf avgt 10 0.222 ± 0.004 ns/op
VarIntSizeBenchmark.test268435456VarIntSizeLoop avgt 10 1.133 ± 0.002 ns/op
- [infinispan#413] Optimize TagWriterImpl#writeString for ASCII only Strings - [infinispan#414] Replace ByteArrayOutputStreamEx with non-synchronized implementation
8d395e5
to
aad22eb
Compare
Closes #413
Closes #414