new feature feedback: Create pipelines via Python API #333

yaythomas · 2023-10-03T06:56:33Z

yaythomas
Oct 3, 2023
Maintainer

This discussion is to get some feedback about a (big!) new feature:

new pipeline api

create a Python API that models pypyr pipelines. This will allow coders to create their pypyr pipelines directly in python code (rather than in yaml files). It will also help with validating pipelines in advance (without needing to run them 1st #116 & #229)

@lucasrcezimbra has already done some great exploratory work here in draft PR #332! 🙌 🏆

There is a Work in Progress branch that shows what we're up to here: main...classify

See here for an example of how an API consumer would create pipelines with classes: https://github.com/pypyr/pypyr/blob/classify/tests/integration/pypyr/DELETE_ME_wip_pipeline_as_api_test.py

This code is pretty much backwards compatible - current API consumers should NOT notice a difference, other than that malformed pipelines will fail sooner than they used to (some validation errors will now happen when parsing the pipeline, rather than when running it).

current status

This code runs (~~although I haven't updated any unit tests, so I'd expect a whole LOT of unit test failures~~ ~~unit test all passing, but still missing test coverage for newly introduced code~~), but I've tested the principle and it runs pipelines and I think it's probably about right in terms of the actual end-to-end functionality. If we do decide to go ahead with this, next steps would be
a) ~~check if all failing tests are failing for the right reasons, and not because we overlooked some logic~~
b) ~~updating all the failing tests so they're working again~~
c) ~~general clean-up - I worked in a hurry, so this is more along the lines of a Proof-of-Concept than finessed and tidy code 😅~~

feedback

None of this is final as of yet, so if there is any community feedback, concerns, wishlist items.... now is the time to get involved!

please use this discussion thread for the new feature.

I'm still thinking through whether the PipelineBody class and its helpers like create_steps_group, and how the pipelinerunner uses them with the new run_pipeline_body entry point is
a) easy enough and
b) future proof,

so this is by no means necessarily final, and I'm very interested to hearing your feedback!

Provisional Release Notes

[These provisional notes are a summary of the thoughts/progress in the rest of this discussion thread, and I'll keep on updating it as the current latest/greatest ideas refine]

new features

create pipelines in code

In the new major version of pypyr you will be able to create pipelines in Python code rather than in yaml:

pipeline_body = PipelineBody()
pipeline_body.create_steps_group([
    Step(name='pypyr.steps.set',
         in_parameters={'set': {'a': 'b'}}),
    Step(name='pypyr.steps.echo',
         in_parameters={'echoMe': PyString("print('test 4567')")},
         retry_decorator=RetryDecorator(sleep=1, max=3))

])

immediate validation on load

Pipeline validation now happens when a pipeline first loads, rather than only later during the execution phase. pypyr used to validate the structure of each step just-in-time as it was running that step. In the new version the validation happens when the pipeline loads (before it runs). This means you do not have to wait until a long-running pipeline has progressed until a failure point to validate that your pipeline structure is correct.

Note that this validation is for over-all pipeline structure. Validation of an individual step's in arguments still happens on step execution, and not when the pipeline first loads.

breaking changes

The new functionality does come at a cost though... there will be 2 unavoidable BREAKING CHANGES in the new version:

PipelineDefinition in custom loader

If you have a custom pipeline loader that returns a PipelinelineDefinition,
the pipeline property now is a PipelineBody rather than a Mapping.

If your custom loader returns a Mapping it will keep on working as before, you do NOT need to change anything:

def get_pipeline_definition(pipeline_name, parent):
    with get_filelike_obj_from_somewhere(pipeline_name) as yaml_file:
        pipeline_yaml = pypyr.yaml.get_pipeline_yaml(yaml_file)

    # This will keep on working as before, no changes necessary
    return pipeline_yaml

If however, you are returning a PipelineDefinition from your custom loader, you will need to migrate to the new way of doing things:

won't work anymore

from pathlib import Path

from pypyr.pipedef import PipelineDefinition, PipelineInfo
import pypyr.yaml

CWD = Path.cwd()

def get_pipeline_definition(pipeline_name, parent):
    """Simple loader that gets pipeline_name.yaml in parent or working dir."""
    # pipeline_name could be "subdir/mydir/subdir/mypipe"
    pipeline_path = (parent.joinpath(f'{pipeline_name}.yaml')
                     if parent else CWD.joinpath(f'{pipeline_name}.yaml'))

    with open() as yaml_file:
        pipeline_yaml = pypyr.yaml.get_pipeline_yaml(pipeline_path)
    
    # set parent property so child pipelines can resolve relative to it
    info = PipelineInfo(pipeline_name=pipeline_name,
                        parent=pipeline_path.parent,
                        loader=__name__)

    # wrap pipeline body in a PipelineDefinition alongside its metadata
    return PipelineDefinition(pipeline=pipeline_yaml, info=info)

new

Amend your PipelineDefinition.pipeline to take a PipelineBody rather than a Mapping:

from pathlib import Path

from pypyr.pipedef import PipelineBody, PipelineDefinition, PipelineInfo
import pypyr.yaml

CWD = Path.cwd()

def get_pipeline_definition(pipeline_name, parent):
    """Simple loader that gets pipeline_name.yaml in parent or working dir."""
    # pipeline_name could be "subdir/mydir/subdir/mypipe"
    pipeline_path = (parent.joinpath(f'{pipeline_name}.yaml')
                     if parent else CWD.joinpath(f'{pipeline_name}.yaml'))

    with open() as yaml_file:
        pipeline_yaml = pypyr.yaml.get_pipeline_yaml(pipeline_path)

    # initialize your yaml into a PipelineBody instance <<<< This is the new bit
    pipeline_body = PipelineBody.from_mapping(pipeline_yaml)

    # set parent property so child pipelines can resolve relative to it
    info = PipelineInfo(pipeline_name=pipeline_name,
                        parent=pipeline_path.parent,
                        loader=__name__)

    # wrap pipeline body in a PipelineDefinition alongside its metadata
    return PipelineDefinition(pipeline=pipeline_body, info=info)

Note that you also now have the option of building your pipeline in code, rather than loading it from yaml.

No more arbitrary yaml in pipelines

If you have any arbitrary yaml in your pipelines, this now HAS to be under the _meta key.

won't work anymore

Previously, you could have arbitrary custom yaml sections that were NOT valid pypyr step-groups:

# some arbitrary properties
author: some author name
description: some help text here 

# some anchors that the pipeline will later reference with aliases
common:
  retry1: &commonRetryFixedList
    sleep: [2, 4, 8]
    max: 3
    sleepMax: 6

  retry2: &commonRetryWithSubstitutions
    sleep: '{base3_exponential_retry[sleep]}'
    max: '{base3_exponential_retry[max]}'
    sleepMax: '{base3_exponential_retry[sleepMax]}'
    jrc: '{base3_exponential_retry[jrc]}'
    backoff: '{base3_exponential_retry[backoff]}'
    backoffArgs:
      base: '{base3_exponential_retry[base]}'

# finally, the rest of the yaml is the actual valid pypyr pipeline
steps:
  - name: pypyr.steps.assert
    retry: *commonRetryWithSubstitutions
    in:
      assert: !py retryCounter == 3

new

In the new version, you have to move arbitrary yaml under a _meta key:

# new _meta group contains custom or arbitrary fields
_meta:
    author: some author name
    description: some help text here 

    anchors:
        # some anchors that the pipeline will later reference with aliases
        common:
        retry1: &commonRetryFixedList
            sleep: [2, 4, 8]
            max: 3
            sleepMax: 6

        retry2: &commonRetryWithSubstitutions
            sleep: '{base3_exponential_retry[sleep]}'
            max: '{base3_exponential_retry[max]}'
            sleepMax: '{base3_exponential_retry[sleepMax]}'
            jrc: '{base3_exponential_retry[jrc]}'
            backoff: '{base3_exponential_retry[backoff]}'
            backoffArgs:
            base: '{base3_exponential_retry[base]}'

# finally, this is where the functional part of the pypyr pipeline begins
steps:
  - name: pypyr.steps.assert
    retry: *commonRetryWithSubstitutions
    in:
      assert: !py retryCounter == 3

lucasrcezimbra · 2023-10-05T00:59:22Z

lucasrcezimbra
Oct 5, 2023

I tested the classify branch locally, which seems to work as expected.

I had to update my custom loader because the PipelineDefinition now receives a PipelineBody instead of a dict. It was the only breaking change that affected me.

I opened PR #334 fixing the DSL tests.

0 replies

yaythomas · 2023-10-05T03:43:07Z

yaythomas
Oct 5, 2023
Maintainer Author

Great, thanks for the feedback! Your test fixes PR is merged into the WiP branch now. I also added a fix for the failing integration tests.

I'm going to have a think about about Anchors and Aliases... knew this was feeling too easy so far 😆

Currently the new code won't deserialise custom anchors like this:

common:
  retry1: &commonRetryFixedList
    sleep: [2, 4, 8]
    max: 3
    sleepMax: 6

  retry2: &commonRetryWithSubstitutions
    sleep: '{base3_exponential_retry[sleep]}'
    max: '{base3_exponential_retry[max]}'
    sleepMax: '{base3_exponential_retry[sleepMax]}'
    jrc: '{base3_exponential_retry[jrc]}'
    backoff: '{base3_exponential_retry[backoff]}'
    backoffArgs:
      base: '{base3_exponential_retry[base]}'

steps:
  - name: pypyr.steps.py
    retry: *commonRetryFixedList
    in:
      py: |
        outList.append(f's1.{retryCounter}')
        if retryCounter < 3:
          raise ValueError('arb')

So I'll have to think a bit about what shapes anchors can take. Could be we just discard them outright from the PipelineBody and only keep if it's a list-like - current validation that doesn't work

        for k, v in mapping.items():
            if k == 'context_parser':
                context_parser = v
                continue

            # >>> might need to get rid off the following structure checks
            if not isinstance(v, Sequence):
                raise PipelineDefinitionError(
                    "step group must be sequence/list.")
            else:
                if isinstance(v, (str, bytes, bytearray)):
                    raise PipelineDefinitionError(
                        "step group must be a list, not a string")
            # <<< END of might need to get rid off the following structure checks

            step_groups[k] = [Step.from_step_definition(
                step_def) for step_def in v]

1 reply

yaythomas Oct 21, 2023
Maintainer Author

decision: anchors now have to go under a new _meta key.

lucasrcezimbra · 2023-10-05T12:53:38Z

lucasrcezimbra
Oct 5, 2023

About the __eq__ method mentioned in #334 and implemented in #336.

I didn't add it on #334 because I would suggest using @dataclass or attrs for these classes. Both would create the under methods for us.

Dataclass is the built-in option, so we would not need to add new dependencies.

attrs is more robust and would enable us to add validations as described in #116.

What do you think?

0 replies

yaythomas · 2023-10-05T15:56:01Z

yaythomas
Oct 5, 2023
Maintainer Author

DataClass: pypyr is showing its age... started in 3.6 where DataClass wasn't a thing yet. Now that the minimum supported version is 3.7 it's for sure something to think about it! I'm likely not to want to add to the current churn because:
a) the boilerplate/annoying work is already done
b) We're changing a LOT of logic and internals here, and I don't want to add DataClass into the mix while we're worrying about core logic

Attrs: awesome lib, but I'm trying to avoid adding dependencies - pypyr gets downloaded a lot as a CI/CD tool, so I try to keep the payload light. Additionally, given how little of the power of attrs we'd actually be using I'd feel bad pulling it in just for that.

0 replies

yaythomas · 2023-10-05T15:59:17Z

yaythomas
Oct 5, 2023
Maintainer Author

Remaining tests with failures:

tests/integration/pypyr/retries/retry_int_test.py - having a think about it - validation on non-lists in mapping
tests/unit/pypyr/cache/loadercache_test.py - pending decision re Retries - validation on non-lists in mapping
tests/integration/pypyr/loaders/string_loader_test.py
tests/unit/pypyr/pipeline_test.py
tests/unit/pypyr/stepsrunner_test.py - thinking to keep this file separate as it is, just repoint it to the PipelineBody class.
tests/unit/pypyr/loaders/string_test.py

6 replies

yaythomas Oct 7, 2023
Maintainer Author

tests/unit/pypyr/pipeline_test.py in progress [edit: done]

yaythomas Oct 8, 2023
Maintainer Author

tests/unit/pypyr/stepsrunner_test.py in progress [edit: done]

yaythomas Oct 21, 2023
Maintainer Author

all tests passing. still TODO adding new tests for 100% coverage.

Name               Stmts   Miss Branch BrPart  Cover   Missing
--------------------------------------------------------------
pypyr/dsl.py         433      3    156      3    99%   459, 1108, 1338
pypyr/pipedef.py     200     16     76      5    92%   141, 148, 152, 156, 160, 164-167, 213, 217, 241, 253, 256, 262, 265, 271
--------------------------------------------------------------
TOTAL               3938     19   1232      8    99%

yaythomas Dec 16, 2023
Maintainer Author

dsl coverage updated, now 100%. ~~Remaining TODO~~:

coverage report --show-missing --skip-covered
Name               Stmts   Miss Branch BrPart  Cover   Missing
--------------------------------------------------------------
pypyr/pipedef.py     203     17     78      6    91%   142, 149, 153, 157, 161, 165-168, 214, 218, 242, 254, 257, 263, 266, 272, 384
--------------------------------------------------------------
TOTAL               3941     17   1234      6    99%

yaythomas Dec 17, 2023
Maintainer Author

coverage 100%.

yaythomas · 2023-10-07T17:17:22Z

yaythomas
Oct 7, 2023
Maintainer Author

Re the Reference & Anchor problem:

It's not possible to have a pre-determined list of "known" step-group names, because part of the joy of pypyr is that step-group names can dynamically inject at runtime from anywhere - this could be input to the pipeline, or dynamic/unknown result of any given step could be re-used in a call or jump step for a step-group name. The only way to know which step-groups the pipeline calls is actually to run the pipeline - and even there, because of conditional branching and/or error handling, the pipeline might not even run the same step-groups on each run.

A potential solution is to make the following BREAKING changes:

It so happens all of the current documented examples use step-group name common to contain anchors.

A potential solution is to make this a rule, so that common becomes a special "reserved" step-group name that will NOT be parsed for a list of Steps. The side-effect is that existing pipelines that have an actual step-group named "common" will break, because it wouldn't then parse as an executable step-group.

A mitigation could be, if a step_group named common is found, to attempt to parse common as a normal step-group 1st, and if it fails, to let it pass quietly and not add it to the PipelineBody on the assumption that this means it must contain common content. This however means that existing pipelines with a common step-group would silently pass validation errors. We could add a warning output to warn the pipeline operator that this is happening and recommending to move away from using common.

Counter-point: I have no idea how many people even know that this reference & anchor feature exists, I haven't seen any usages in the wild.

Instead of common, can introduce an unlikely new reserved step group name (like context_parser) - e.g _meta, which can become a future collective group for non-step-group information. This would break existing pipelines with anchors in groups other than _meta, but might be a more future proof way and less likely to intrude on pipelines that have a runnable step-group called common - and the latter seems more likely to exist in the wild than the former, meaning less impact.

1 reply

yaythomas Oct 21, 2023
Maintainer Author

current work-in-progress branch introduces a new _meta key. This is a breaking change. Existing pipelines (unlikely, hopefully!) with a step-group named "_meta" won't work anymore. Existing pipelines with arbitrary yaml in the pipeline that is NOT valid step-groups or context_parser will NOT work anymore. Arbitrary yaml now has to move under the _meta key.

Introduced a meta object that is flexible, with provisional convenience property getters for author and help, but pipeline authors can add anything in there they want to. Idea is that _meta.help can ultimately form the basis for self-documenting pipeline help, to display when $ pypyr help mydir/mypipeline

controlpl4n3 · 2023-10-12T15:19:39Z

controlpl4n3
Oct 12, 2023

@yaythomas , This comment is a bit "meta"...
I suspect that having the python API may lead to interesting network effects for cicd considering projects like dagger.io
I also wonder if it would make it easier to dump pipeline details to well formed yaml?

1 reply

yaythomas Oct 13, 2023
Maintainer Author

interesting potential for sure!

sure, adding a yaml export/dump wouldn't be particularly hard... what sort of use-cases do you have in mind for a code-authored pipeline written out to yaml?

yaythomas · 2023-11-06T00:09:13Z

yaythomas
Nov 6, 2023
Maintainer Author

ADR for the change https://github.com/pypyr/pypyr/blob/classify/docs/adr/0006-pipeline-as-code-api.md

0 replies

yaythomas · 2023-12-17T21:17:10Z

yaythomas
Dec 17, 2023
Maintainer Author

Remaining TODO:

typings (i.e mypy should pass)
flake8 linting corrections
new tests to cover pipeline execution when instantiated from code

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new feature feedback: Create pipelines via Python API #333

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

new feature feedback: Create pipelines via Python API #333

yaythomas Oct 3, 2023 Maintainer

new pipeline api

current status

feedback

Provisional Release Notes

new features

create pipelines in code

immediate validation on load

breaking changes

PipelineDefinition in custom loader

won't work anymore

new

No more arbitrary yaml in pipelines

won't work anymore

new

Replies: 9 comments · 9 replies

lucasrcezimbra Oct 5, 2023

yaythomas Oct 5, 2023 Maintainer Author

yaythomas Oct 21, 2023 Maintainer Author

lucasrcezimbra Oct 5, 2023

yaythomas Oct 5, 2023 Maintainer Author

yaythomas Oct 5, 2023 Maintainer Author

yaythomas Oct 7, 2023 Maintainer Author

yaythomas Oct 8, 2023 Maintainer Author

yaythomas Oct 21, 2023 Maintainer Author

yaythomas Dec 16, 2023 Maintainer Author

yaythomas Dec 17, 2023 Maintainer Author

yaythomas Oct 7, 2023 Maintainer Author

yaythomas Oct 21, 2023 Maintainer Author

controlpl4n3 Oct 12, 2023

yaythomas Oct 13, 2023 Maintainer Author

yaythomas Nov 6, 2023 Maintainer Author

yaythomas Dec 17, 2023 Maintainer Author

yaythomas
Oct 3, 2023
Maintainer

Replies: 9 comments 9 replies

lucasrcezimbra
Oct 5, 2023

yaythomas
Oct 5, 2023
Maintainer Author

yaythomas Oct 21, 2023
Maintainer Author

lucasrcezimbra
Oct 5, 2023

yaythomas
Oct 5, 2023
Maintainer Author

yaythomas
Oct 5, 2023
Maintainer Author

yaythomas Oct 7, 2023
Maintainer Author

yaythomas Oct 8, 2023
Maintainer Author

yaythomas Oct 21, 2023
Maintainer Author

yaythomas Dec 16, 2023
Maintainer Author

yaythomas Dec 17, 2023
Maintainer Author

yaythomas
Oct 7, 2023
Maintainer Author

yaythomas Oct 21, 2023
Maintainer Author

controlpl4n3
Oct 12, 2023

yaythomas Oct 13, 2023
Maintainer Author

yaythomas
Nov 6, 2023
Maintainer Author

yaythomas
Dec 17, 2023
Maintainer Author