-
Notifications
You must be signed in to change notification settings - Fork 616
Infinispan Deadlock issue when file-store persistent enabled #12367
Replies: 2 comments · 5 replies
-
I can't say for sure that the SIFS errors are causing the timeouts. The big reason is I can't tell by the logs if both operations are trying to access the same keys. If we make this link, then it could be some bug hidden in SIFS. So, I also can't confirm that without persistence, the issue won't happen again By 15.0, I assume you meant 15.0.0.Final? Since then, we added a few fixes to SIFS (ISPN-15943, ISPN-15930, ISPN-15894). Are you able to test with a more recent version? The latest is 15.0.3.Final. Configuration-wise, it looks good to me. However, @ryanemerson knows more here. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi, @jabolina |
Beta Was this translation helpful? Give feedback.
All reactions
-
A couple of questions.
As @jabolina mentioned, it is difficult to say. One possible case is when the write-behind queue gets full, which blocks further operation until it is flushed to disk. The lock is kept acquired during all this time. In addition, the external Infinispan sends events to the Keycloak nodes and Keycloak needs to process them. Once again, the lock is kept acquired during this period. Can you correlate a peak in your load to the time of those 2 incidents? Offtopic I'm helping the Kecloak team with their Infinispan deployment in multi-site HA scenarios. Can you share the reasons why you are deploying an external Infinispan if cross-site is not enabled? Basically, what problem does the external Infinispan solve that Keycloak itself can't? An external Infinispan is not required for single cluster Keycloak deployments. Thanks. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi, @pruivo
Yes, we are monitoring CPU and there is no spikes during incidents. Also, we have Why we choose such architecture? We don't have so huge experience with Keycloak and maybe we choose wrong architecture. And we found documentation for Keycloak quite tricky. For example, it valid only for latest version, and when you use not latest version, hard to find right configuration for your version. Also, for me it's not clear, why when we have external Infinispan, Keycloak continue to create internal Infinispan cluster. Maybe it can cause also some issue. Don't know if it's by design or also bug. |
Beta Was this translation helpful? Give feedback.
All reactions
-
To second what @pruivo mentioned.. the SIFS error shouldn't cause any issues with timeouts, at worst case it would give a different exception, which you didn't seem to receive. The async modification queue could possibly cause timeouts, but I don't see the disk being backed up by 15+s worth of writes unless something is going wrong with the underlying disk. I don't suspect this as I would normally expect other issues to crop us as well. However, I can't stress how much I advise to upgrade to 15.0.3 as @jabolina mentioned. There are quite a few fixes to SIFS that should prevent the exceptions from occurring in the first place. After upgrading if you still see timeouts than I would suggest taking a thread and heap dump if possible when you are seeing timeouts. If you cannot do that you can enable TRACE level for org.infinispan.persistence.async.AsyncNonBlockingStore category which will generate many messages but will be able to tell you when the queue has filled and you can see how long a batch takes to update. Each one has a unique identifier associated with it that is printed in the TRACE messages. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@DmytroKozlovskyi thanks for the description. Well, for Keycloak 23, you have the chance to connect a database to the embedded caches (see keycloak/keycloak#24774). It will keep your sessions safe and avoid deploying an external Infinispan. The next release, Keycloak 25, brings support to persistent sessions (details here keycloak/keycloak#28265). I'm unsure at this point if it is a fully supported feature or a preview.
That is a limitation of Keycloak. The remote store with external Infinispan is a requirement for multi-site deployments (geographically distributed Keycloak clusters). I'm working on improving Keycloak by removing the embedded caches and relying only on the external Infinispan. See keycloak/keycloak#28745. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Oh, this is what exactly we are looking for. Looking forward to Keycloak 25 release. For now, we decided to upgrade Infinispan to 15.0.3 but keep persistence off and wait for the Keycloak 25 release. Thank you all for your help. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Looking for help here to understand what is wrong with our Keycloak/Infinispan setup.
We upgraded to Keycloak v23.0.7 a few months ago and decoupled Infinispan with v15.0. Also, we have enabled file-based persistence for caches. All worked well until May 7, when we started to have deadlocks (2 incidents).
In the first case there was an error on Pod infinispan-0:
and more than an hour later deadlocks appeared on other pods:
and on pod infinispan-0 at this time
The second case looks a little bit different but again, started from SIFS error:
And then 18 hours later again deadlocks
Are these both cases related to persistence enabled for caches? We have disabled persistence and so far no issue. But we are worried, that we might be wrong, and the issue could come back at any time again. If it's because of persistence enabled, than why do we have the problem: is it a bug or did we mess up with the configuration?
Our full infinispan config:
Beta Was this translation helpful? Give feedback.
All reactions