Events for long-running pipelines #1648
davidbellem
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all thank you for spring modulith! I think it is an absolutely brilliant approach!
I am trying to understand if Spring Modulith and Application Events (using spring-modulith-starter-jdbc) are a good fit for a long-running pipeline. Or, if this is misusing a mechanism that was designed for something else. It feels great to develop with Modulith and the Application Events, but I ran into two issues I would like to get your perspective on.
Context
We use Spring Modulith events to orchestrate a multi-step async pipeline where each
@ApplicationModuleListenerprocesses an event, then publishes downstream events that trigger subsequent steps. Individual steps involve external API calls and take 10-60 seconds each. During batch operations, tens of thousands of events are queued and processed through a thread pool.Think of a document management system that gets a batch of 10.000 documents uploaded and needs to do OCR or something else on each.
Issues we have run into
Issue 1:
republish-outstanding-events-on-restartwith multiple instancesMy understanding is, that with
republish-outstanding-events-on-restart: true, a newly started instance queries the DB for incomplete events and republishes them. However, if I understood correctly, events in PUBLISHED status might already be queued in another instance's thread pool — the status only transitions to PROCESSING when a thread starts executing the listener, not when the task is submitted to the executor queue.This would mean a scale-up event (instance B starts while instance A is running) can cause double processing. The
markResubmittedoptimistic lock (WHERE STATUS != 'RESUBMITTED') prevents two resubmissions from racing, but doesn't protect against the original in-flight dispatch on instance A. Is that correct?Question: Is there a recommended approach for multi-instance deployments? Would it make sense to transition events to a different status at dispatch time rather than execution time, so that other instances can distinguish "queued" from "genuinely stuck"?
Do I need to externalize the events to get to a proper multi-instance deployment?
Issue 2: Staleness checker uses
publicationDatefor all statusesDefaultEventPublicationRegistry.markFailed()always compares againstpublicationDate:This feels problematic for:
PROCESSING: An event published 20 minutes ago that sat in the thread pool queue for 18 minutes and has only been processing for 2 minutes gets marked stale with
spring.modulith.events.staleness.processing: 15m. The check measures time since publication, not time since processing started.RESUBMITTED: An event originally published 1 hour ago that was just resubmitted 10 seconds ago gets immediately marked stale with
spring.modulith.events.staleness.resubmission: 30m, becausepublicationDateis 1 hour old. This makes it impossible to resubmit old failed events while keeping a reasonable staleness threshold.Would it make sense to use the timestamp relevant to each status — time in PROCESSING state for processing staleness, and
lastResubmissionDatefor resubmission staleness? Should I open a ticket for this or is this intended to be as it is?We are using 2.0.5 with JDBC event publication (PostgreSQL)
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions