Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Step Functions: heartbeat tokens become invalid when several executions run concurrently #10648

Closed
1 task done
mbaynton opened this issue Apr 12, 2024 · 3 comments
Closed
1 task done
Assignees
Labels
aws:stepfunctions AWS Step Functions status: resolved/stale Closed due to staleness status: response required Waiting for a response from the reporter type: bug Bug report

Comments

@mbaynton
Copy link

mbaynton commented Apr 12, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If I

  • start several (I'm usually running about 6 at once) executions of the same state machine
  • and that state machine has a step that sends to localstack SQS with waitForTaskToken and also utilizes HeartbeatSeconds
  • and my SQS consumers start sending heartbeat callbacks with their corresponding step function tokens

Then at first, all consumers' SendTaskHeartbeat calls work, but eventually at least one of the running SendTaskHeartbeat calls starts failing with this stringified error:

{error 26 0 operation error SFN: SendTaskHeartbeat, https response error StatusCode: 400, RequestID: c95077cf-4251-464f-aa24-abdf9b7753d6, InvalidToken: }

Subsequent SendTaskHeartbeat calls continue to fail in the same manner until the state machine step times out on account of not having received any hearbeats for the defined HeartbeatSeconds period.

Some of the other concurrent SFN executions complete and others start during the time the longer one starts experiencing this issue, not sure if that's the cause.

Expected Behavior

SendTaskHeartbeat calls continue to be accepted until I call back to SFN with a success or failure.

How are you starting LocalStack?

With a docker-compose file

Steps To Reproduce

How are you starting localstack (e.g., bin/localstack command, arguments, or docker-compose.yml)

By docker compose running a service with a defined dependency on localstack. docker-compose for the localstack service below.

version: '3.0'
services:
  localstack:
    image: localstack/localstack:3.3.0
    environment:
      - SERVICES=dynamodb,s3,sqs,stepfunctions
      - DATA_DIR=/var/lib/localstack/data
    ports:
      - 4566:4566
      - 4571:4571
    volumes:
      - ./scripts/localstack-startup.sh:/docker-entrypoint-initaws.d/localstack-startup.sh:cached
      - ./tmp/localstack:/var/lib/localstack

Client commands (e.g., AWS SDK code snippet, or sequence of "awslocal" commands)

Probably requires more than a few commands, sorry.
We're experiencing this in a private codebase, I would have to write a reproduce from scratch.

Environment

- OS: Ubuntu 22.04
- LocalStack: 3.3.0

Anything else?

Fixed by using PROVIDER_OVERRIDE_STEPFUNCTIONS=legacy

@mbaynton mbaynton added status: triage needed Requires evaluation by maintainers type: bug Bug report labels Apr 12, 2024
@localstack-bot
Copy link
Collaborator

Welcome to LocalStack! Thanks for reporting your first issue and our team will be working towards fixing the issue for you or reach out for more background information. We recommend joining our Slack Community for real-time help and drop a message to LocalStack Pro Support if you are a Pro user! If you are willing to contribute towards fixing this issue, please have a look at our contributing guidelines and our contributing guide.

@MEPalma MEPalma self-assigned this Apr 15, 2024
@MEPalma MEPalma added aws:stepfunctions AWS Step Functions status: in progress Currently being worked on and removed status: triage needed Requires evaluation by maintainers labels Apr 15, 2024
@MEPalma
Copy link
Contributor

MEPalma commented Apr 18, 2024

@mbaynton Thank you for taking the time to compile this report. We were not able to replicate the exact same behaviour as you described, however we found some related issues that may be related to what you can observe. We recently merged some changes that aim to address this issue. These changes are scheduled to be included in the next nightly release too. I would be grateful if you could test the new build at your earliest convenience and provide feedback on whether it resolves the problem you encountered. Thank you once again for bringing this issue forward.

@MEPalma MEPalma added status: response required Waiting for a response from the reporter and removed status: in progress Currently being worked on labels Apr 18, 2024
@localstack-bot
Copy link
Collaborator

Hello 👋! It looks like this issue hasn’t been active in longer than two weeks. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@localstack-bot localstack-bot added the status: stale To be closed soon due to staleness label May 2, 2024
@localstack-bot localstack-bot added status: resolved/stale Closed due to staleness and removed status: stale To be closed soon due to staleness labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:stepfunctions AWS Step Functions status: resolved/stale Closed due to staleness status: response required Waiting for a response from the reporter type: bug Bug report
Projects
None yet
Development

No branches or pull requests

3 participants