Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very weird problem (possibly related to SSL connections handling in reactive RabbitMQ client) #145

Open
62mkv opened this issue Sep 17, 2020 · 9 comments

Comments

@62mkv
Copy link

62mkv commented Sep 17, 2020

Expected Behavior

Application should be capable of receiving messages reliably under load, even over AMQPS

Actual Behavior

We have a case when reactive RabbitMQ client throws exceptions, loses connection and is basically unusable for prolonged periods of time

Steps to Reproduce

Unfortunately, I can not yet produce a reliably reproducing case that would not depend on our in-house RabbitMQ installation, This is what I can share though:

We have a problem which I only could reproduce against our in-house RabbitMQ server, and even then only under some specific circumstances.

the gist of the problem is that occassionally, when our application (WebFlux/Reactive-RabbitMQ) is publishing and receiving many messages in parallel (think more then 20 parallel consumers, I use k6 --vus 50 to create load), it starts throwing exceptions like this:

com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; cause: javax.net.ssl.SSLException: Tag mismatch!
    at com.rabbitmq.client.impl.AMQConnection.ensureIsOpen(AMQConnection.java:175) ~[amqp-client-5.9.0.jar:5.9.0]
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

and

javax.net.ssl.SSLException: Buffer closed
        at com.rabbitmq.client.impl.nio.SslEngineHelper.write(SslEngineHelper.java:176) ~[amqp-client-5.9.0.jar:5.9.0]
        at com.rabbitmq.client.impl.nio.SslEngineByteBufferOutputStream.doFlush(SslEngineByteBufferOutputStream.java:59) ~[amqp-client-5.9.0.jar:5.9.0]
        at com.rabbitmq.client.impl.nio.SslEngineByteBufferOutputStream.flush(SslEngineByteBufferOutputStream.java:53) ~[amqp-client-5.9.0.jar:5.9.0]
        at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[na:na]
        at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:244) ~[amqp-client-5.9.0.jar:5.9.0]
        at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

and

reactor.rabbitmq.RabbitFluxException: Error while setting publisher confirms on channel
        at reactor.rabbitmq.Sender.lambda$sendWithTypedPublishConfirms$6(Sender.java:282) ~[reactor-rabbitmq-1.4.4.RELEASE.jar:1.4.4.RELEASE]
        Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

etc. There were certainly some other ones, can't find in the logs right now. When I run exactly the same scenario with "traditional" stack (Spring WebMVC+Spring AMQP), not a single exception of this sort occurs.

I could collect whichever information is necessary though, if I would 've been suggested appropriate tools or suggestions on how to pin it down

(it was first shared on Gitter, but no reaction followed, so I decided to raise an issue and see if anyone would want to chime in)

Possible Solution

Your Environment

  • Reactive-RabbitMQ: 1.4.4.RELEASE
  • Reactor version(s) used: 3.3.5.RELEASE
  • Other relevant libraries versions (eg. netty, ...): 4.1.49.Final
  • JVM version (java -version): 11.0.2 (same with 8.0.172)
  • OS and version (eg uname -a): Win10 x64
@acogoluegnes
Copy link
Contributor

Please provide the broker and Erlang versions, as well the broker logs when the problems occurs. It seems the connection is closed for some reason, so the broker logs will be helpful to find out more.

Please /cc me on Gitter for any question related to Reactor RabbitMQ.

@62mkv
Copy link
Author

62mkv commented Sep 17, 2020

RabbitMQ 3.8.3, Erlang 22.2.1

logs will come later

@62mkv
Copy link
Author

62mkv commented Sep 17, 2020

more exception from the application:

avax.net.ssl|ERROR|77|rabbitmq-nio|2020-09-17 11:37:26.908 EEST|TransportContext.java:313|Fatal (UNEXPECTED_MESSAGE): Input record too big: max = 16709 len = 43125 (
"throwable" : {
  javax.net.ssl.SSLProtocolException: Input record too big: max = 16709 len = 43125
        at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:591)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:443)
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:422)
        at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:634)
        at com.rabbitmq.client.impl.nio.SslEngineFrameBuilder.somethingToRead(SslEngineFrameBuilder.java:49)
        at com.rabbitmq.client.impl.nio.FrameBuilder.readFrame(FrameBuilder.java:71)
        at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:156)
        at java.base/java.lang.Thread.run(Thread.java:834)}

)
javax.net.ssl|ALL|77|rabbitmq-nio|2020-09-17 11:37:26.909 EEST|SSLSessionImpl.java:763|Invalidated session:  Session(1600329088114|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
2020-09-17 11:37:26.917  WARN 46868 --- [   rabbitmq-nio] com.rabbitmq.client.impl.nio.NioLoop     : Error during reading frames

javax.net.ssl.SSLProtocolException: Input record too big: max = 16709 len = 43125
        at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:591) ~[na:na]
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:443) ~[na:na]
        at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:422) ~[na:na]
        at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:634) ~[na:na]
        at com.rabbitmq.client.impl.nio.SslEngineFrameBuilder.somethingToRead(SslEngineFrameBuilder.java:49) ~[amqp-client-5.9.0.jar:5.9.0]
        at com.rabbitmq.client.impl.nio.FrameBuilder.readFrame(FrameBuilder.java:71) ~[amqp-client-5.9.0.jar:5.9.0]
        at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:156) ~[amqp-client-5.9.0.jar:5.9.0]
        at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

javax.net.ssl|WARNING|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:168|outbound has closed, ignore outbound application data
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:507|WRITE: TLS12 alert, length = 2
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLCipher.java:1727|Plaintext before ENCRYPTION (
  0000: 02 0A                                              ..
)
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:525|Raw write (
  0000: 15 03 03 00 1A 00 00 00   00 00 00 01 A3 CE 79 97  ..............y.
  0010: F4 7D 9E 13 A9 89 38 85   31 4E C8 ED E2 E2 97     ......8.1N.....
)

@acogoluegnes
Copy link
Contributor

acogoluegnes commented Sep 17, 2020

Could you please try with the blocking IO mode as well?

@62mkv
Copy link
Author

62mkv commented Sep 17, 2020

you mean, without using .useNio on ConnectionFactory?

@acogoluegnes
Copy link
Contributor

Yes.

@62mkv
Copy link
Author

62mkv commented Sep 17, 2020

hmm, with that commented out, the errors are gone. Uncommented again - here they go. what gives? Should I still extract logs ?

the only things related to my app around the time when problem was first manifested are:

2020-09-17 08:37:32.498 [info] <0.11968.3742> accepting AMQP connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671)
2020-09-17 08:37:32.535 [info] <0.11968.3742> Connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671) has a client-provided name: hello-esb
2020-09-17 08:37:32.569 [info] <0.11968.3742> connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671 - hello-esb): user 'hello-app' authenticated and granted access to vhost 'esb-test'
2020-09-17 08:37:27.348 [warning] <0.17059.5998> closing AMQP connection <0.17059.5998> (10.61.4.9:59276 -> 10.61.41.66:5671 - hello-esb, vhost: 'esb-test', user: 'hello-app'):

@acogoluegnes
Copy link
Contributor

This might be a bug in the Java client NIO connector. Contrary to blocking IO, TLS is not transparent with NIO, it requires crypting/decrypting before leaving/reaching the application code.

It would be great to be able to reproduce independently from Reactor RabbitMQ.

As a workaround you can switch to the blocking IO mode for now. This should not affect your application unless you use many connections (understand 100s or 1000s of connections in your client applications).

@62mkv
Copy link
Author

62mkv commented Sep 17, 2020

that's cool!! every single one example in the documentation uses that, so I basically assumed that to be a pre-condition for reactive behaviour. Thanks for the suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants