feat: Add a wrapper for quarto markdown rendering #3003

votti · 2024-06-20T14:20:20Z

Quarto has become a frequently used markdown notebook authoring system.

This wrapper uses quarto to render R Quarto .qmd notebooks using the the Quarto params mechanism to pass parameters from snakemake.

R dependency handling with renv is supported.

The meta.yml contains a more detailed description of the wrapper.
Test cases for all the major functionality and demonstrating the renv integration are included.

QC

I confirm that:

For all wrappers added by this PR,

there is a test case which covers any introduced changes,
input: and output: file paths in the resulting rule can be changed arbitrarily,
either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
all environment.yaml specifications follow the respective best practices,
the environment.yaml pinning has been updated by running snakedeploy pin-conda-envs environment.yaml on a linux machine,
wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
the meta.yaml contains a link to the documentation of the respective tool or command,
Snakefiles pass the linting (snakemake --lint),
Snakefiles are formatted with snakefmt,
Python wrapper scripts are formatted with black.
Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

This wrapper uses quarto to render R quarto markdown .qmd notebooks. R dependency handling with conda or renv is supported.

fgvieira · 2024-06-21T06:04:26Z

Do you need to specify all those dependencies?
Can you try to specify just the absolutely needed packages (no dependencies)? For example, - r-quarto =1.4.

And why do you need a second environment?

This simplifies the dependencies in the main conda environment. This required then to rewrite the tests to use a separate conda environment with renv installed in order to build and restore the renv.

votti · 2024-06-21T07:38:43Z

Hi @fgvieira ,
Thanks a lot for your comments!
Maybe all this build dependencies for r-packages were indeed not needed. I simplified this to just depend on r-quarto.
Looking at the other packages, it seems like pining the version in the environment.yml is not needed as the pin.txt files are used. Thus I did not pin the r-quarto version.

@multiple environments:
The other environments are simply for testings and to demonstrate how:

r dependencies required to render the markdown can be installed by overwriting the conda: directive of the wrapper. This is tested by having additionally data.table installed in the extra.yml environment.
demonstrate and test how renv may be used together with this quarto wrapper. This is achieved by installing data.table with renv. This requires a conda env with r-renv installed.

fgvieira · 2024-06-21T07:43:24Z

@votti we need a version in the environment.yaml for the update github action. Can you add one?

Don't think we need the test/.gitignore file, right?

If not needed, best to have a single env. You can always add data.table to the main environment (if you think it is needed).

utils/quarto/test/Snakefile

votti · 2024-06-21T08:34:38Z

utils/quarto/test/Snakefile

+# resources CLI flag: --resources renv_lock=1 --set-resource-scopes renv_lock=global
+
+
+rule renv_example_init:


Similar the renv testcase requires renv as an additional dependency which I would track in the separate renv.yaml.

utils/quarto/test/.gitignore

Generally these should not need to be updated. If updating renv.yaml, the renv_dt.lock file needs to be adapted as well.

test.py

As discussed in the PR this does not need to be explicitly shown. Instead I have added this more expliiclty to the documentation.

Before this would have raised an error `log` not defined

utils/quarto/meta.yaml

fgvieira · 2024-06-23T14:57:22Z

utils/quarto/meta.yaml

+        threads: lambda wc, threads: threads
+        mem: lambda wc, resources: resources.mem


Wouldn't it make more sense for the wrapper to use threads and mem specified under resources?

I definitely think it would make sense and attempted to implement this, but technically I did not find a way to achieve it:
We currently pass parameters for the markdown rendering using the --P, --execute-param KEY:VALUE syntax of the quarto render command. This only accepts parameters that are actually defined in the markdown document to be rendered and fails otherwise. This behaviour is coded in the knit_params_get used under the hood: https://github.com/rstudio/rmarkdown/blob/4b2f3426c1f46a8d207b0d661fc51c2508ecdad7/R/params.R#L28-L34
I dont see any possiblity to easily change this behaviour.

Thus if we would pass threads and mem as a parameter automatically, these would require the explicit declaration of these parameters in the markdown rendered.
If the markdown rendered does not explicitly supports these parameters, rendering would fail.

Thus to me requiring all markdowns to be rendered using this wrapper to support these two resource parameters would not be a good solution, as it would limit the applicability of the wrapper.
Also it would not be a complete solution as supporting any other resources would still require a workaround as the one documented here as well.
Automatically passing any resource parameter would be even more brittle as additional resources may be passed depending on the snakemake configuration used.

If I understand correctly, params/threads/mem need to be specified inside the .qmd file AND passed as command-line arguments (--P, --execute-param KEY:VALUE)?

Yes that is how it works.

That is a bit unpractical since the params of the rule might not match the ones in the .qmd file! On the other hand, it is also not possible to add more parameters.

How are parameters included in the .qmd file? Is it an upstream rule?
Would it make sense for the wrapper to overwrite (or check) the params in the .qmd file to ensure they match?

Think of the qmd file like an extension to Rmarkdown or akin to a Jupyter notebook.
I imagine the typical use of the wrapper would be users to hand-write the .qmd notebook, like they would do in a Rmarkdown or Jupyter notebook.
The parameter are included when authoring the file. The defined parameters are then used in the report to eg customize titles or as parameters in functions.

Would it make sense for the wrapper to overwrite (or check) the params in the .qmd file to ensure they match?

@overwriting: I am not sure if this would be sensible: adding additional parameters would not typically not really having an effect. I would say is is really rare for people to write code that handles arbitrary additional parameters and thus typically these would simply be ignored.

@cheking to ensure they match: this is what quarto render does at runtime and the error message raised is quite understandable, so I would say this is already done.

What about if the wrapper just read the parameters in the .qmd file and add those with --execute-param KEY:VALUE? Would that make sense?

I think this could make sense and it should be 'easy' to implement using knitr:knit_params: https://rdrr.io/cran/knitr/man/knit_params.html
or by writing a python function that parses the 'front matter' of such documents based some regexp.

This definitely would be quite convenient to implicitly pass these parameters if appropriately named parameters exist in the document.
Personally I am not so sure if I like such an implicit parameter passing as I tend to prefer explicit over implicit for such cases. Implicitly passing these may make it difficult to detect errors. Eg if in the parameter in the quarto document is called 'thread' instead of 'threads' this would lead to a silent ignoring of the parameter.
Also it is not possible to see at a glance from the rule if the document is respecting the resources.

What would you think as an alternative of a parameter: 'params: resource_parameters' that allow the user to specify a list of strings with resources to be passed as parameters:
Eg:

params: resource_parameters: ['threads', 'mem_mb']

What would you prefer?

I now implemented that if a quarto params named resources_<resource_name> is defined
the corresponding snakemake resource parameter is automatically passed.
eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be
passed.
threads will also be passed if a quarto parameter resrouces_threads is present.

Would this work for you?

But how would those quarto params be defined?

I am not very familiar with quarto but my concern is that, if the rule resources change then the same input won't work (since the resources are hard-coded inside the file).
As I see it, the only way to ensure it, is to always overwrite (or add) the resources in the .qmd file with those in snakemake.resources. If you think the current wrapper can do it, then I am happy. 😄

Co-authored-by: Filipe G. Vieira <[email protected]>

As suggested resources that match the name should be automatically passed. Tests added: - test if a custom resource added will be passed - test if the special resource 'threads' will be passed

If a quarto params named resources_<resource_name> is defined the corresponding snakemake resource parameter is automatically passed. eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be passed.

Accidentally the header of the notes section in meta.yaml was removed

fgvieira · 2024-08-08T13:04:45Z

The idea behind renv is to allow to install other R packages not included in the original wrapper environment, no?
Would it be possible to just use conda for that? I mean, use the wrapper but with a custom conda environment (conda:).

@johanneskoester what do you think of using renv inside the wrapper? Or should it be done at the snakemake level (like conda or singularity)?

johanneskoester · 2024-11-01T09:09:30Z

I think it is worth adding any further missing R packages to conda first. While it is fine to support renv as an alternative to conda at some point in general, I would say it is not really needed here, isn't it? We autobump and lock conda packages here, supporting a different kind of environments for wrappers would only further increase the complexity without improving the user experience (since conda is capable to handle R packages just fine).

johanneskoester · 2024-11-01T09:12:46Z

I just checked some of the packages in the renv, and all of them are available on conda-forge. I did not check all of them, but maybe they are even all already available for adding them to the environment.yaml?

Add a wrapper for quarto markdown rendering

50e5a6f

This wrapper uses quarto to render R quarto markdown .qmd notebooks. R dependency handling with conda or renv is supported.

votti changed the title ~~Feature: Add a wrapper for quarto markdown rendering~~ feat: Add a wrapper for quarto markdown rendering Jun 20, 2024

docs: small corrections

318edf0

votti force-pushed the feature/quarto branch from 51034c4 to 1a5f64a Compare June 20, 2024 14:33

doc: improve documentation

a95864f

votti force-pushed the feature/quarto branch from 1a5f64a to a95864f Compare June 20, 2024 14:35

Vito Zanotelli added 2 commits June 21, 2024 09:22

doc: improve documentation

26519a2

Remove renv dependencies from conda env

b17f428

This simplifies the dependencies in the main conda environment. This required then to rewrite the tests to use a separate conda environment with renv installed in order to build and restore the renv.

votti force-pushed the feature/quarto branch from 6ae3f03 to b17f428 Compare June 21, 2024 07:28

Fix meta.yml rendering errors

9423bd6

Vito Zanotelli added 2 commits June 21, 2024 09:45

Add tests for quarto wrapper

98019c4

Pin quarto version

6f5d86b

votti commented Jun 21, 2024

View reviewed changes

utils/quarto/test/Snakefile Outdated Show resolved Hide resolved

votti commented Jun 21, 2024

View reviewed changes

utils/quarto/test/.gitignore Outdated Show resolved Hide resolved

Pin the versions in the test environments

1ece336

Generally these should not need to be updated. If updating renv.yaml, the renv_dt.lock file needs to be adapted as well.

fgvieira reviewed Jun 21, 2024

View reviewed changes

test.py Outdated Show resolved Hide resolved

Vito Zanotelli added 3 commits June 21, 2024 11:47

Fix wrong folder for quarto tests

8a01c7e

Fix test refering to ../environment.yml

2207ff9

Remove the conda r-dependency example

97e1928

As discussed in the PR this does not need to be explicitly shown. Instead I have added this more expliiclty to the documentation.

votti force-pushed the feature/quarto branch 2 times, most recently from f337a87 to b17f428 Compare June 22, 2024 04:37

votti and others added 2 commits June 22, 2024 06:48

Merge branch 'master' into feature/quarto

3eec186

Handle case where not log file defined

1d1515a

Before this would have raised an error `log` not defined

fgvieira reviewed Jun 23, 2024

View reviewed changes

utils/quarto/meta.yaml Show resolved Hide resolved

fgvieira reviewed Jun 23, 2024

View reviewed changes

Add Quarto url

0985231

Co-authored-by: Filipe G. Vieira <[email protected]>

votti and others added 6 commits June 27, 2024 08:41

Merge branch 'master' into feature/quarto

64a7c88

Merge branch 'master' into feature/quarto

efe4687

Add tests for resource passing

a2b6704

As suggested resources that match the name should be automatically passed. Tests added: - test if a custom resource added will be passed - test if the special resource 'threads' will be passed

Automatically pass resources if named resources_

3c0a647

If a quarto params named resources_<resource_name> is defined the corresponding snakemake resource parameter is automatically passed. eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be passed.

Add missing notes section

7c76c19

Accidentally the header of the notes section in meta.yaml was removed

Fix error quarto has no params

aff004e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add a wrapper for quarto markdown rendering #3003

feat: Add a wrapper for quarto markdown rendering #3003

votti commented Jun 20, 2024

fgvieira commented Jun 21, 2024 •

edited

Loading

votti commented Jun 21, 2024

fgvieira commented Jun 21, 2024 •

edited

Loading

votti Jun 21, 2024

fgvieira Jun 23, 2024

votti Jun 24, 2024

fgvieira Jun 25, 2024

votti Jun 25, 2024

fgvieira Jun 25, 2024

votti Jun 25, 2024

fgvieira Jun 26, 2024

votti Jun 26, 2024

votti Jul 23, 2024

fgvieira Aug 8, 2024

fgvieira commented Aug 8, 2024

johanneskoester commented Nov 1, 2024

johanneskoester commented Nov 1, 2024

		# resources CLI flag: --resources renv_lock=1 --set-resource-scopes renv_lock=global


		rule renv_example_init:

		threads: lambda wc, threads: threads
		mem: lambda wc, resources: resources.mem

feat: Add a wrapper for quarto markdown rendering #3003

Are you sure you want to change the base?

feat: Add a wrapper for quarto markdown rendering #3003

Conversation

votti commented Jun 20, 2024

QC

fgvieira commented Jun 21, 2024 • edited Loading

votti commented Jun 21, 2024

fgvieira commented Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgvieira commented Aug 8, 2024

johanneskoester commented Nov 1, 2024

johanneskoester commented Nov 1, 2024

fgvieira commented Jun 21, 2024 •

edited

Loading

fgvieira commented Jun 21, 2024 •

edited

Loading