You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the CLI is implemented as a local Python script. Dockerizing the CLI reduces the local setup time and lowers the chances of errors.
Previously, we have tried to run the CLI as a Docker and created a "Docker in Docker" scenario which didn't work because of faulty volume mounting (see #99).
Alternatively, we could try to use Airflow for orchestrating the pipeline runs.
This is the user story to provide some background for the purpose of the CLI:
User Story
As a user, I want a CLI to select all the pipelines I'd like to run. I can choose the geographical region for which I want to run the pipelines. In the case of the population data, I also want to select specific demographic groups I am interested in.
Once I have made all my selections, the CLI will run all the pipelines correctly (e.g., the google-poi pipeline depends on the osm-poi pipeline). Once the data pipelines ran successfully, all the data should be imported into the Postgres database.
When all the data has been imported, the Jupyter environment should be launched so I can start working with the data conveniently.
Next to running the individual data pipelines, I want to be able to download the demo data through the CLI. Once the demo data is downloaded, the database and Jupyter notebook with the popularity correlation should be launched.
The text was updated successfully, but these errors were encountered:
Currently, the CLI is implemented as a local Python script. Dockerizing the CLI reduces the local setup time and lowers the chances of errors.
Previously, we have tried to run the CLI as a Docker and created a "Docker in Docker" scenario which didn't work because of faulty volume mounting (see #99).
Alternatively, we could try to use Airflow for orchestrating the pipeline runs.
This is the user story to provide some background for the purpose of the CLI:
User Story
As a user, I want a CLI to select all the pipelines I'd like to run. I can choose the geographical region for which I want to run the pipelines. In the case of the population data, I also want to select specific demographic groups I am interested in.
Once I have made all my selections, the CLI will run all the pipelines correctly (e.g., the
google-poi
pipeline depends on theosm-poi
pipeline). Once the data pipelines ran successfully, all the data should be imported into the Postgres database.When all the data has been imported, the Jupyter environment should be launched so I can start working with the data conveniently.
Next to running the individual data pipelines, I want to be able to download the demo data through the CLI. Once the demo data is downloaded, the database and Jupyter notebook with the
popularity correlation
should be launched.The text was updated successfully, but these errors were encountered: