Skip to content

Python API compatibility with custom Papermill engines #1122

Open
@rmshkv

Description

@rmshkv

I'm trying to use the Ploomber Python API as a backend to a data workflow that uses a custom Papermill engine for running notebooks (to enable an additional feature of templating Markdown cells with jinja). The way this had worked with Papermill alone previously is passing the new engine's name to papermill.execute_notebook() under the engine_name param. I tried doing this with Ploomber's NotebookRunner task, setting the executor to 'papermill' and including the custom engine name under the additional executor_params, but Ploomber seems not to support this, throwing the following error:

File "/glade/work/eromashkova/miniconda3/lib/python3.9/site-packages/ploomber/tasks/notebook.py", line 659, in __init__
raise KeyError(
KeyError: 'Found conflicting options: executor is set to papermill but "engine_name" is set to md_jinja in "executor_params Please use only one of the parameters or pass the same executor to both'

An example of the way I'm setting up the NotebookRunner task is below:

import ploomber
import papermill as pm
from papermill.engines import NBClientEngine

class md_jinja_engine(NBClientEngine):
    @classmethod
    def execute_managed_notebook(cls, nb_man, kernel_name, **kwargs):
        jinja_data = {} if "jinja_data" not in kwargs else kwargs["jinja_data"]

        # call the papermill execution engine:
        super().execute_managed_notebook(nb_man, kernel_name, **kwargs)

        for cell in nb_man.nb.cells:
            if cell.cell_type == "markdown":
                cell["source"] = Template(cell["source"]).render(**jinja_data)

pm.engines.papermill_engines._engines["md_jinja"] = md_jinja_engine

pm_params = {
     'engine_name': 'md_jinja',
     'jinja_data': parms,
     'cwd': nb_path_root}

task = ploomber.tasks.NotebookRunner(Path(input_path), ploomber.products.File(output_path + '.ipynb'), dag, params=parms_in, executor='papermill', executor_params=pm_params, kernelspec_name=info['kernel_name'], name=output_name)

Let me know if any more info would be helpful.

Is this something that would be a doable fix/enhancement in Ploomber's infrastructure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions