Backplane AutoRecover not working as expected (or are my expectations wrong? 😅) #158
-
Hello, The POC consists of 3 projects:
FusionCache is configured to use Redis as 2nd level cache and backplane. I wanted to test the AutoRecover feature so I start up Redis and both FusionCacheApi and FusionCacheWorker.
Am I missing something? Thanks for your help :) |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 3 replies
-
Hi @martinobordin , thanks for considering FusionCache. Your expectations seem correct (note: see below for a correction)! I was about to ask you some more details like which cache duration you were using, which configurations, which version, etc but you've been so kind to prepare a POC. Therefore I'll take a look at that and will let you know 😊 |
Beta Was this translation helpful? Give feedback.
-
Hi @martinobordin , sorry for the delay but it has been a rough week. Anyway I wanted to update you on some things, specifically:
🐞 A BugThere was in fact a bug or, to be more precise, a scenario that was not supported as well as I would've liked, that is when both the distributed cache and the backplane would fail at the same time. In that situation the end result was not always the ideal one, and there was some work to be done. Now, thanks to your input, I'm handling it correctly 🎉 🤔 Your ExpectationsAlthough I told you expectations were correct in my first answer, in fact I have to change that and tell you that is not actually the case. Let me explain. It is true that the idea behind the AutoRecovery feature is that it would automatically handle cases of out-of-sync caches when there are connection issues, but there's one thing to keep in mind: the 1st (memory) and 2nd (distributed) levels are, still, caches. They are not and should not be used as a data source. This means that FusionCache is built around the fact that the cache is just a cache, and the usual way of working with it is, for example, to use the The AutoRecovery feature has been designed to cover a scenario in which there are multiple nodes, and for some time the backplane (used to keep the nodes in-sync) is gone away: when that happens outgoing messages will not be delivered, and AutoRecovery will keep them in a local queue and, as soon as the backplane will be up again, they will be sent (in an optimized way, with de-duplication of multiple messages for the same cache key and other heuristics like that). But again, this means that there must be a so called "single source of truth", which normally is a database. In your POC though, for which again I thank you (and I'm using, slightly modified, as a testing scenario), you set some data in the cache from the API, and then read it from the WORKER: the problem is that in this way the data exists only in the cache (memory and/or distributed) and nowhere else because there's no database (real or fake). Now consider that every cache, be it in memory or distributed, is ephemeral by definition: in the case of your POC in fact, Redis is memory only and not persisted. This means that when the cache itself goes away, like in the part of the test where we stop Redis and then restart it, the data is basically gone for good. So basically the normal flow in FusionCache (and for which the AutoRecovery feature has been designed) is this: call This will:
I'm not sure I've been able to explain myself fully and clearly, so please let me know! ⏩ The Path ForwardOn top of fixing the first situation I mentioned earlier, there are other things I'd like to change while I'm on this "backplane sprint". For example up until now, the processing of the autorecovery queue would happen as soon as a message would arrive to a node (like a wake up call) in a "passive" way: this means btw that if the backplane is up again but no new messages are received for, say, Your POC is super simple and isolated and this becomes more evident, because the only messages going around are for the cache key Now, with the new version I'm working on, this is no longer the case and FusionCache is more "active" in this regard (but still, since the distributed cache is the only source of truth, it does not work as you expected). I'm also investigating other edge cases and area for improvements, on top of even better performance in some cases. Again, thanks for trying out FusionCache, for the POC and in general for your time. Will update you as soon as I'll have some news about this. |
Beta Was this translation helpful? Give feedback.
-
Hey @jodydonetti, I totally agree we're talking about cache and not datastore, so Anyway, after step 3 this would not be enough, since the worker keeps using the not expired InMemory entry; it won't get fresh data (from the 2nd level cache or from the datastore) until the entry expires (or the backplane starts to deliver change notification again). I'm glad you find a way to handle this edge case. So thank you very much again for your wonderful library and the even better support and documentation you deliver! |
Beta Was this translation helpful? Give feedback.
-
Hey @martinobordin , I just finished all the work behind the backplane sprint I talked about, and I'll release a new version with these optimizations very soon. You can see most of the things here and here. Hope this helps. ps: when done I'll create a PR for your POC so you'll be able to test the changes. |
Beta Was this translation helpful? Give feedback.
-
ps: if you've found my answer useful, can you please mark it as the answer? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @martinobordin , v0.23.0 has been released with all of the above included 🎉 Please let me know if all is working correctly now, thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @jodydonetti and thank you for the support. After restarting Redis (step 8), I see that the "Worker" reconnects, but it continues to print the old value (10, in my screenshot below) and after a while, it doesn't print anything (the cache entry has been evicted). Given your explanation, maybe this is the expected behavior since the worker should go to directly to the database and get the latest version of the data, is that true? At this point, my question would be when the backplane will update other nodes with the newest value, and when not? 🤔 If you want we can have a shared session so I'll show you. Thank you for your time. |
Beta Was this translation helpful? Give feedback.
Hi @martinobordin , sorry for the delay but it has been a rough week.
I have some updates for you: your question pushed me into a rabbit hole of changes, optimizations and bugfixes and... I'm not out of it yet 😅
Anyway I wanted to update you on some things, specifically:
🐞 A Bug
There was in fact a bug or, to be more precise, a scenario that was not supported as well as I would've liked, that is when both the distributed cache and the backplane would fail at the same time. In that situation the end result was not always the ideal one, and there was some work to be done.
Now, thanks to your input, I'm handling it correctly 🎉
🤔 Your Expectations
Al…