Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Stepfunction stops working when starting distributed map state second time #10662

Open
1 task done
jhKone opened this issue Apr 15, 2024 · 3 comments
Open
1 task done
Assignees
Labels
aws:stepfunctions AWS Step Functions status: response required Waiting for a response from the reporter type: bug Bug report

Comments

@jhKone
Copy link

jhKone commented Apr 15, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I'm trying to test decently simple step function which can run multiple loops of getting part of data, processing it in distributed state map and doing this until all data has been processed.

Simplied graph of the step function

Screenshot 2024-04-15 at 13 19 38

What ends up happening is that the first time this step function goes through the map state it is fine and if it doesn't need to get more data, it ends with SUCCEEDED.

But if it goes back to the data fetching first lambda, and then starts another distrubuted map state, everything stops after that and nothing happens until the step function time outs. If I change the map state mode to INLINE for localstack, everything works correctly and multiple loops can be run with no problems.

Here first execution is when running map state only once, second what happens when running it the second time using distributed mode.

Screenshot 2024-04-15 at 13 22 43

Last event I can see in the execution history is MapRunStarted, but then nothing.

Screenshot 2024-04-15 at 13 25 38

In docker logs this is the last message I can see, then nothing:

l.s.s.a.c.eval_component : [ASL] [comp] [StateMap]: '(StateMap| {'comment': None, 'input_path': (InputPath| {'input_path_src': '$'}, 'output_path': (OutputPath| {'output_path': '$'}, 'state_entered_event_type': 'MapStateEntered', 'state_exited_event_type': 'MapStateExited', 'result_path': (ResultPath| {'result_path_src': None}, 'result_selector': None,

...

_comment': None, '_processor_config': (ProcessorConfig| {'mode': <Mode.Distributed: 81>, 'execution_type': <ExecutionType.Standard: 83>}, '_eval_input': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.distributed_item_processor.DistributedItemProcessorEvalInput object at 0x7f724a691150>, '_job_pool': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.job.JobPool object at 0x7f723f1499d0>, '_mutex': <unlocked _thread.lock object at 0x7f7250f07680>, '_map_run_record': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.map_run_record.MapRunRecord object at 0x7f723f10dbd0>, '_workers': [<localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.distributed_item_processor_worker.DistributedItemProcessorWorker object at 0x7f723f11cfd0>]}}'

Expected Behavior

Step function should be able to run distributed map state multiple times in execution if it needs to fetch data in batches. Similarly as it currently does with INLINE mode which is working as expected.

How are you starting LocalStack?

With a docker-compose file

Steps To Reproduce

How are you starting localstack (e.g., bin/localstack command, arguments, or docker-compose.yml)

docker-compose with these localstack settings and using version 3.3

environment:
- DOCKER_HOST=unix:///var/run/docker.sock
- DEBUG=1
- LAMBDA_IGNORE_ARCHITECTURE=1

Error shows up only when step function execution goes through map state multiple times. If it only run the map state it once, it works as expected.

Environment

- OS: Mac 14.4 / Gitlab pipeline
- LocalStack: 3.3

Anything else?

I also tried to change the memory and cpu available for docker, and make the map state simpler with not many lambdas running at once, but no change in this behaviour.

@jhKone jhKone added status: triage needed Requires evaluation by maintainers type: bug Bug report labels Apr 15, 2024
@localstack-bot
Copy link
Collaborator

Welcome to LocalStack! Thanks for reporting your first issue and our team will be working towards fixing the issue for you or reach out for more background information. We recommend joining our Slack Community for real-time help and drop a message to LocalStack Pro Support if you are a Pro user! If you are willing to contribute towards fixing this issue, please have a look at our contributing guidelines and our contributing guide.

@MEPalma MEPalma self-assigned this Apr 15, 2024
@MEPalma MEPalma added status: in progress Currently being worked on aws:stepfunctions AWS Step Functions and removed status: triage needed Requires evaluation by maintainers labels Apr 15, 2024
@MEPalma
Copy link
Contributor

MEPalma commented May 8, 2024

Thank you for taking the time to compile this report. I was able to replicate the behaviour that resulted in a failure. We recently merged some changes that aim to address this issue. These changes are scheduled to be included in the next nightly release too. I would be grateful if you could test the new build at your earliest convenience and provide feedback on whether it resolves the problem you encountered. Thank you once again for bringing this issue forward.

@jhKone
Copy link
Author

jhKone commented May 8, 2024

Thank you @MEPalma !

@MEPalma MEPalma added status: response required Waiting for a response from the reporter and removed status: in progress Currently being worked on labels May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:stepfunctions AWS Step Functions status: response required Waiting for a response from the reporter type: bug Bug report
Projects
None yet
Development

No branches or pull requests

3 participants