Skip to content

bug: Stepfunction stops working when starting distributed map state second time #10662

Closed
@jhKone

Description

@jhKone

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I'm trying to test decently simple step function which can run multiple loops of getting part of data, processing it in distributed state map and doing this until all data has been processed.

Simplied graph of the step function

Screenshot 2024-04-15 at 13 19 38

What ends up happening is that the first time this step function goes through the map state it is fine and if it doesn't need to get more data, it ends with SUCCEEDED.

But if it goes back to the data fetching first lambda, and then starts another distrubuted map state, everything stops after that and nothing happens until the step function time outs. If I change the map state mode to INLINE for localstack, everything works correctly and multiple loops can be run with no problems.

Here first execution is when running map state only once, second what happens when running it the second time using distributed mode.

Screenshot 2024-04-15 at 13 22 43

Last event I can see in the execution history is MapRunStarted, but then nothing.

Screenshot 2024-04-15 at 13 25 38

In docker logs this is the last message I can see, then nothing:

l.s.s.a.c.eval_component : [ASL] [comp] [StateMap]: '(StateMap| {'comment': None, 'input_path': (InputPath| {'input_path_src': '$'}, 'output_path': (OutputPath| {'output_path': '$'}, 'state_entered_event_type': 'MapStateEntered', 'state_exited_event_type': 'MapStateExited', 'result_path': (ResultPath| {'result_path_src': None}, 'result_selector': None,

...

_comment': None, '_processor_config': (ProcessorConfig| {'mode': <Mode.Distributed: 81>, 'execution_type': <ExecutionType.Standard: 83>}, '_eval_input': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.distributed_item_processor.DistributedItemProcessorEvalInput object at 0x7f724a691150>, '_job_pool': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.job.JobPool object at 0x7f723f1499d0>, '_mutex': <unlocked _thread.lock object at 0x7f7250f07680>, '_map_run_record': <localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.map_run_record.MapRunRecord object at 0x7f723f10dbd0>, '_workers': [<localstack.services.stepfunctions.asl.component.state.state_execution.state_map.iteration.itemprocessor.distributed_item_processor_worker.DistributedItemProcessorWorker object at 0x7f723f11cfd0>]}}'

Expected Behavior

Step function should be able to run distributed map state multiple times in execution if it needs to fetch data in batches. Similarly as it currently does with INLINE mode which is working as expected.

How are you starting LocalStack?

With a docker-compose file

Steps To Reproduce

How are you starting localstack (e.g., bin/localstack command, arguments, or docker-compose.yml)

docker-compose with these localstack settings and using version 3.3

environment:
- DOCKER_HOST=unix:///var/run/docker.sock
- DEBUG=1
- LAMBDA_IGNORE_ARCHITECTURE=1

Error shows up only when step function execution goes through map state multiple times. If it only run the map state it once, it works as expected.

Environment

- OS: Mac 14.4 / Gitlab pipeline
- LocalStack: 3.3

Anything else?

I also tried to change the memory and cpu available for docker, and make the map state simpler with not many lambdas running at once, but no change in this behaviour.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions