Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(queue): fixes bug in SqlQueue doContainsMessage to handle multiple batches #4184

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ivorp
Copy link

@ivorp ivorp commented Sep 21, 2021

A small bug fix which ensures the SqlQueue implementation of containsMessage works when there are more messages than the currently hardcoded batch size (100).

This problem was discovered by the ZombieExecutionService detecting false zombie executions because the underlying SqlQueue.doContainsMessage implementation wasn not searching through all batches of messages available.

@@ -201,7 +200,7 @@ class SqlQueue(
.intoResultSet()
}

while (!found && rs.next()) {
while (!found && rs.row < batchSize && rs.next()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying problem here is a little subtle. The bug appears to be the result of the ResultSet.next() overshooting the cursor at the end of searching a batch.

The resulting ResultSet.row() check below never equals the expected batchSize and so no further batches are queried when they should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole bit of code seems like it could do with some restructuring. It's half-way between native JDBC and JOOQ and makes it hard to reason about because it's mixing concepts.

Given that JOOQ fetches the entire result set in to memory anyway, I don't see a reason to not just use its Result object directly and break out of the search if a matching message is found or the result set is empty.

Maybe something like?

    do {
      val rs: Result<Record3<Any, Any, Any>> = withRetry(READ) {
        jooq.select(idField, fingerprintField, bodyField)
          .from(messagesTable)
          .where(idField.gt(lastId))
          .limit(batchSize)
          .fetch()
      }

      rs.forEach { record ->
        val body = record.getValue("body", String::class.java)
        try {
          if (predicate.invoke(mapper.readValue(body))) return true
        } catch (e: Exception) {
          log.error("Failed reading message with fingerprint: ${record.getValue("fingerprint", String::class.java)} message: $body", e)
        }
        lastId = record.getValue("id", String::class.java)
      }
    } while (rs.isNotEmpty)

    return false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be tidied up further by adding some types to the field declarations (idField etc), but that would be a bigger change.

@@ -78,3 +84,50 @@ private val retryPolicy: RetryProperties = RetryProperties(
maxRetries = 1,
backoffMs = 10 // minimum allowed
)

class SqlQueueSpecificTests {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't want to come empty handed without tests. However I could not find a nice place for SqlQueue specific test implementation.

Open to suggestions on what to do with these - or simply remove them if this fix doesn't warrant the additional testing overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the tests identify an edge case I think they definitely need to be added. Is there a way to add them without changing the visibility of the doContainsMessage() function?

mergify bot pushed a commit that referenced this pull request Mar 11, 2024
…more than 100 items (#4648)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
mergify bot pushed a commit that referenced this pull request Mar 23, 2024
…more than 100 items (#4648)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)
mergify bot pushed a commit that referenced this pull request Mar 23, 2024
…more than 100 items (#4648)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)
mergify bot pushed a commit that referenced this pull request Mar 23, 2024
…more than 100 items (#4648)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)
mergify bot added a commit that referenced this pull request Mar 26, 2024
…more than 100 items (#4648) (#4683)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)

Co-authored-by: Kirill Batalin <[email protected]>
mergify bot added a commit that referenced this pull request Mar 26, 2024
…more than 100 items (#4648) (#4685)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)

Co-authored-by: Kirill Batalin <[email protected]>
mergify bot added a commit that referenced this pull request Mar 26, 2024
…more than 100 items (#4648) (#4684)

`SqlQueue#doContainsMessage` doesn't process more than 1 batch because of an incorrect loop inside.
When the last element in the batch is processed (`ResultSet#next` returns `false`), the following invocation of `ResultSet#getRow` will return 0. No matter how many rows were processed before.

Basically, this PR is just a copy of #4184 with addressed comments.
But #4184 is abandoned so opened this one. Kudos to Ivor for the original PR

Co-authored-by: Jason <[email protected]>
(cherry picked from commit 0a52909)

Co-authored-by: Kirill Batalin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants