Running PySpark Jobs in Parallel within Dagster on Dockerized Setup #26780
Unanswered
MammadTavakoli
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am implementing a Docker setup for running Dagster and PySpark together. My docker-compose.yml file looks like this:
The Dagster Dockerfile is as follows:
Problem Description
I want to process data for five countries in parallel. For each country:
parallel).
Here is the Dagster code I wrote for this:
The
shop_raw_assets
array generates five assets (one per country) and I expected these assets to run in parallel. However:Environment
Question
How can I modify my setup or code to ensure that the assets in
shop_raw_assets
run in parallel, utilizing the available Spark cores effectively?Beta Was this translation helpful? Give feedback.
All reactions