Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata storage for jobs/workflows #240

Open
anjensan opened this issue Apr 15, 2021 · 0 comments
Open

Metadata storage for jobs/workflows #240

anjensan opened this issue Apr 15, 2021 · 0 comments

Comments

@anjensan
Copy link
Contributor

Metadata storage for bigflow jobs/workflows

There are several usecases for simple document/key-value storage

  1. Save (append) information about executed workflows/jobs.
    ID, run-time, docker hash, execution time, cost estimate, result etc...
    Basically some sort of structured logs, which may be used to
    see execution history & do some cost estimation (manually)

  2. Query for running workflows/jobs, their status (history and/or curenly running workflows)

    bigflow history -w workflow_id
    Such cli api migh be a first step towards "airflow-free" solution
    (aka ability to replace airflow with custom cron-like service)

  3. Communicate between taks/workflows.
    In some rare cases one workflow migh want to check status of another.
    Also workflow migh check if another instance is currently running.
    This especially important for dev-like environments, where
    workflows are executed locally (via bigflow run).

  4. Persist some information between tasks/jobs.
    Like 'last-processed-id' (for incremental processing),
    last time-per-batch (to auto-adjust batch-size) etc.

Database - anything for 1. BigQuery / any-sql-like DB for 1/2/3/4.

Client visible API - TBD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant