Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API compatibility with custom Papermill engines #1122

Open
rmshkv opened this issue Jun 28, 2023 · 3 comments
Open

Python API compatibility with custom Papermill engines #1122

rmshkv opened this issue Jun 28, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@rmshkv
Copy link

rmshkv commented Jun 28, 2023

I'm trying to use the Ploomber Python API as a backend to a data workflow that uses a custom Papermill engine for running notebooks (to enable an additional feature of templating Markdown cells with jinja). The way this had worked with Papermill alone previously is passing the new engine's name to papermill.execute_notebook() under the engine_name param. I tried doing this with Ploomber's NotebookRunner task, setting the executor to 'papermill' and including the custom engine name under the additional executor_params, but Ploomber seems not to support this, throwing the following error:

File "/glade/work/eromashkova/miniconda3/lib/python3.9/site-packages/ploomber/tasks/notebook.py", line 659, in __init__
raise KeyError(
KeyError: 'Found conflicting options: executor is set to papermill but "engine_name" is set to md_jinja in "executor_params Please use only one of the parameters or pass the same executor to both'

An example of the way I'm setting up the NotebookRunner task is below:

import ploomber
import papermill as pm
from papermill.engines import NBClientEngine

class md_jinja_engine(NBClientEngine):
    @classmethod
    def execute_managed_notebook(cls, nb_man, kernel_name, **kwargs):
        jinja_data = {} if "jinja_data" not in kwargs else kwargs["jinja_data"]

        # call the papermill execution engine:
        super().execute_managed_notebook(nb_man, kernel_name, **kwargs)

        for cell in nb_man.nb.cells:
            if cell.cell_type == "markdown":
                cell["source"] = Template(cell["source"]).render(**jinja_data)

pm.engines.papermill_engines._engines["md_jinja"] = md_jinja_engine

pm_params = {
     'engine_name': 'md_jinja',
     'jinja_data': parms,
     'cwd': nb_path_root}

task = ploomber.tasks.NotebookRunner(Path(input_path), ploomber.products.File(output_path + '.ipynb'), dag, params=parms_in, executor='papermill', executor_params=pm_params, kernelspec_name=info['kernel_name'], name=output_name)

Let me know if any more info would be helpful.

Is this something that would be a doable fix/enhancement in Ploomber's infrastructure?

@edublancas
Copy link
Contributor

this sounds like a regression error. we added executor in 0.22.4, can you try using 0.22.3?

@mehtamohit013: do you remember why we added this validation? looks like a bug. allowing people to use a custom papermill engine sounds reasonable

@edublancas edublancas added the bug Something isn't working label Jun 29, 2023
@mehtamohit013
Copy link
Contributor

@edublancas
It seems papermill also uses the engine_name parameter. So, what does engine_name intend to do is, select from Papermill and ploomber-engine. We can completely remove the engine_name and use only the executor param or can use something like executor_name? The code should run fine with version 0.22.3.

@rmshkv
Copy link
Author

rmshkv commented Jun 29, 2023

Thanks for looking into this! I tested with 0.22.3 and it works as expected (with executor_params changed to papermill_params)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants