Events for long-running pipelines #1648

davidbellem · 2026-04-09T22:05:58Z

davidbellem
Apr 9, 2026

First of all thank you for spring modulith! I think it is an absolutely brilliant approach!

I am trying to understand if Spring Modulith and Application Events (using spring-modulith-starter-jdbc) are a good fit for a long-running pipeline. Or, if this is misusing a mechanism that was designed for something else. It feels great to develop with Modulith and the Application Events, but I ran into two issues I would like to get your perspective on.

Context

We use Spring Modulith events to orchestrate a multi-step async pipeline where each @ApplicationModuleListener processes an event, then publishes downstream events that trigger subsequent steps. Individual steps involve external API calls and take 10-60 seconds each. During batch operations, tens of thousands of events are queued and processed through a thread pool.

Think of a document management system that gets a batch of 10.000 documents uploaded and needs to do OCR or something else on each.

Issues we have run into

Issue 1: `republish-outstanding-events-on-restart` with multiple instances

My understanding is, that with republish-outstanding-events-on-restart: true, a newly started instance queries the DB for incomplete events and republishes them. However, if I understood correctly, events in PUBLISHED status might already be queued in another instance's thread pool — the status only transitions to PROCESSING when a thread starts executing the listener, not when the task is submitted to the executor queue.

This would mean a scale-up event (instance B starts while instance A is running) can cause double processing. The markResubmitted optimistic lock (WHERE STATUS != 'RESUBMITTED') prevents two resubmissions from racing, but doesn't protect against the original in-flight dispatch on instance A. Is that correct?

Question: Is there a recommended approach for multi-instance deployments? Would it make sense to transition events to a different status at dispatch time rather than execution time, so that other instances can distinguish "queued" from "genuinely stuck"?

Do I need to externalize the events to get to a proper multi-instance deployment?

Issue 2: Staleness checker uses `publicationDate` for all statuses

DefaultEventPublicationRegistry.markFailed() always compares against publicationDate:

private void markFailed(Status status, Staleness staleness) {
    var duration = staleness.getStaleness(status);
    var reference = clock.instant().minus(duration);
    var result = events.findByStatus(status).stream()
            .filter(it -> it.getPublicationDate().isBefore(reference))
            ...
}

This feels problematic for:

PROCESSING: An event published 20 minutes ago that sat in the thread pool queue for 18 minutes and has only been processing for 2 minutes gets marked stale with spring.modulith.events.staleness.processing: 15m. The check measures time since publication, not time since processing started.
RESUBMITTED: An event originally published 1 hour ago that was just resubmitted 10 seconds ago gets immediately marked stale with spring.modulith.events.staleness.resubmission: 30m, because publicationDate is 1 hour old. This makes it impossible to resubmit old failed events while keeping a reasonable staleness threshold.

Would it make sense to use the timestamp relevant to each status — time in PROCESSING state for processing staleness, and lastResubmissionDate for resubmission staleness? Should I open a ticket for this or is this intended to be as it is?

We are using 2.0.5 with JDBC event publication (PostgreSQL)

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events for long-running pipelines #1648

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Events for long-running pipelines #1648

Uh oh!

Uh oh!

davidbellem Apr 9, 2026

Context

Issues we have run into

Issue 1: republish-outstanding-events-on-restart with multiple instances

Issue 2: Staleness checker uses publicationDate for all statuses

Replies: 0 comments

davidbellem
Apr 9, 2026

Issue 1: `republish-outstanding-events-on-restart` with multiple instances

Issue 2: Staleness checker uses `publicationDate` for all statuses