Runtime `upsert` and `delete` events race conditions #894

yhabteab · 2025-02-28T10:32:52Z

Apparently the runtime state updates don't seem to work as they should. Specifically, I spent almost 2 days debugging why Icinga DB always keeps some leftovers in the dependency_edge_state and redundancy_group_state table while testing #889. After some time I got clueless where the problem might lie and wanted to check if the actual host_state and service_state tables actually get cleaned up correctly when the respective host or/and service object is removed and surprisingly they don't! Icinga 2 does not even send runtime delete events for these two tables and any leftovers will only get cleaned up after the next config dump triggered by either a Icinga DB or Icinga 2 process restart. However, the thing is that unlike these two tables, Icinga 2 does send runtime delete events for the dependency states but the integration tests from #889 were still able to identify some leftovers in the dependency_edge_state and redundancy_group_state. Fortunately, @julianbrost gave me a hint and suggested to temporarily disable the parallel state updates triggered from here:

icingadb/cmd/icingadb/main.go

Lines 301 to 303 in a77498d

    
           logger.Info("Starting state runtime updates sync") 
        
           return rt.Sync(synctx, v1.StateFactories, runtimeStateUpdateStreams, true)

Tada 🎉, that finally ended my seemingly never-ending nightmare! It seems that when the allowParallel parameter is set to true, the RuntimeUpdates#Sync() method does not honor the exact order of delete and upsert events from Redis, resulting in some rare and difficult to manually trigger race conditions where a subsequent runtime delete event from Icinga 2 might actually surpass the previously sent upsert event. Since fixing this issue would require a major refactoring of the Icinga DB code, I'm not even going to try to fix it now, instead I'm just creating this PR to have a reference in the future so it doesn't get lost.

Add initial tests for host/service state updates

09e947a

cla-bot bot added the cla/signed label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime `upsert` and `delete` events race conditions #894

Runtime `upsert` and `delete` events race conditions #894

yhabteab commented Feb 28, 2025

	logger.Info("Starting state runtime updates sync")

	return rt.Sync(synctx, v1.StateFactories, runtimeStateUpdateStreams, true)

Runtime upsert and delete events race conditions #894

Are you sure you want to change the base?

Runtime upsert and delete events race conditions #894

Conversation

yhabteab commented Feb 28, 2025

Runtime `upsert` and `delete` events race conditions #894

Runtime `upsert` and `delete` events race conditions #894