Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to automatically reconnect with Redisson Client for some random nodes #5855

Open
jlaber opened this issue May 9, 2024 · 2 comments
Open
Labels

Comments

@jlaber
Copy link

jlaber commented May 9, 2024

Our organization recently ran into an issue in production where we have a Spring Boot web application using Redisson client connecting to Redis for our Http Web Session. The networking group made a change that temporarily made reaching the Redis servers unavailable. This was a planned event. The expected behavior is that the applications will automatically hook back up once they were reachable again. Majority of our instances all reconnected with no issue, but there were a couple instances that did not reconnect, and were stuck in an invalid state. We're looking to see if there is some known (or unknown) issue or configuration that may explain or prevent the issue from happening. We're unable to reproduce this issue in a controlled setup locally or in a test environment to try to get more details on what happened. The hope is that we may be able to get some help with a high level of the production situation and the errors logged.

For all the nodes that self recovered, during the outage there were errors that looked like this:

Execution of message listener failed, and no ErrorHandler has been set.
org.springframework.dao.QueryTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]; nested exception is org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:48) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:35) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.redisson.spring.data.connection.RedissonConnection.transform(RedissonConnection.java:204) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.syncFuture(RedissonConnection.java:199) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.sync(RedissonConnection.java:369) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.read(RedissonConnection.java:750) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.hGetAll(RedissonConnection.java:1541) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.core.DefaultHashOperations.lambda$entries$18(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultHashOperations.entries(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultBoundHashOperations.entries(DefaultBoundHashOperations.java:223) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.getSession(RedisIndexedSessionRepository.java:458) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.onMessage(RedisIndexedSessionRepository.java:544) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.processMessage(RedisMessageListenerContainer.java:842) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.lambda$dispatchMessage$8(RedisMessageListenerContainer.java:990) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
Caused by: org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]
at org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$9(RedisExecutor.java:433) ~[redisson-3.25.2.jar!/:3.25.2]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
... 1 more

These errors continued until Redis was reachable, then they just dropped off and all connections worked fine again.

For the couple that stayed in a bad state. The errors looked like this:

There was ONE single error that matched the others right at the time of the connection outage:
org.springframework.dao.QueryTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]; nested exception is org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:48) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:35) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.redisson.spring.data.connection.RedissonConnection.transform(RedissonConnection.java:204) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.syncFuture(RedissonConnection.java:199) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.sync(RedissonConnection.java:369) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.read(RedissonConnection.java:750) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.hGetAll(RedissonConnection.java:1541) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.core.DefaultHashOperations.lambda$entries$18(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultHashOperations.entries(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultBoundHashOperations.entries(DefaultBoundHashOperations.java:223) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.getSession(RedisIndexedSessionRepository.java:458) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.onMessage(RedisIndexedSessionRepository.java:544) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.processMessage(RedisMessageListenerContainer.java:842) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.lambda$dispatchMessage$8(RedisMessageListenerContainer.java:990) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
Caused by: org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false Check connection with Redis node: our-redis-host/our-redis-ip:our-redis-port for TCP packet drops or bandwidth limits. Try to increase nettyThreads and/or timeout settings. Command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]], channel: [id: 0xdb7cd1fa, L:/ip:port - R:our-redis-host/our-redis-ip:our-redis-port]
at org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$9(RedisExecutor.java:433) ~[redisson-3.25.2.jar!/:3.25.2]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
... 1 more

Then there was a big gap in errors (5-10 mintues). Then the errors started again but looked like this:
Execution of message listener failed, and no ErrorHandler has been set.
org.springframework.dao.QueryTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@143cd25b[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]] after 3 retry attempts; nested exception is org.redisson.client.RedisTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@143cd25b[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]] after 3 retry attempts
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:48) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonExceptionConverter.convert(RedissonExceptionConverter.java:35) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.redisson.spring.data.connection.RedissonConnection.transform(RedissonConnection.java:204) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.syncFuture(RedissonConnection.java:199) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.sync(RedissonConnection.java:369) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.read(RedissonConnection.java:750) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.redisson.spring.data.connection.RedissonConnection.hGetAll(RedissonConnection.java:1541) ~[redisson-spring-data-27-3.25.2.jar!/:3.25.2]
at org.springframework.data.redis.core.DefaultHashOperations.lambda$entries$18(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultHashOperations.entries(DefaultHashOperations.java:307) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.core.DefaultBoundHashOperations.entries(DefaultBoundHashOperations.java:223) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.getSession(RedisIndexedSessionRepository.java:458) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.session.data.redis.RedisIndexedSessionRepository.onMessage(RedisIndexedSessionRepository.java:544) ~[spring-session-data-redis-2.7.4.jar!/:2.7.4]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.processMessage(RedisMessageListenerContainer.java:842) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at org.springframework.data.redis.listener.RedisMessageListenerContainer.lambda$dispatchMessage$8(RedisMessageListenerContainer.java:990) ~[spring-data-redis-2.7.18.jar!/:2.7.18]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
Caused by: org.redisson.client.RedisTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@143cd25b[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], command: (HGETALL), params: [[112, 97, 114, 116, 121, 45, 114, 101, 109, 45, ...]] after 3 retry attempts
at org.redisson.command.RedisExecutor$1.run(RedisExecutor.java:268) ~[redisson-3.25.2.jar!/:3.25.2]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.101.Final.jar!/:4.1.101.Final]
... 1 more

It appears like any connection information of where it was trying to connect is gone in the message, and it's now just filling up the connection pool.

Any help would be greatly appreciated, and thank you in advanced.

Version Info:
JDK: 8
Spring Boot: 2.7.18
Redisson Spring Boot Starter, Redisson Spring Data 27, Redisson: 3.25.2
Redis: Redis Enterprise 6.4.2-94

Config Info:
spring.session.store-type=redis
spring.redis.host=our-redis-host
spring.redis.port=our-redis-port
spring.redis.username=user
spring.redis.password=pw
spring.redis.ssl=true
spring.session.redis.namespace=our-namespace

Beyond that, we're not setting any Redisson custom configurations, so it is using default Redisson configurations.

Our Redis servers are fronted by an F5 LTM, which was fully functional after the network change was finished.

@jlaber jlaber added the question label May 9, 2024
@mrniko
Copy link
Member

mrniko commented May 10, 2024

Can you share all redisson log entries for failed nodes?

@jlaber
Copy link
Author

jlaber commented May 10, 2024

Attached some of our logs. Thank you.

redis-issue.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants