-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add a wrapper for quarto markdown rendering #3003
base: master
Are you sure you want to change the base?
Conversation
This wrapper uses quarto to render R quarto markdown .qmd notebooks. R dependency handling with conda or renv is supported.
Do you need to specify all those dependencies? And why do you need a second environment? |
This simplifies the dependencies in the main conda environment. This required then to rewrite the tests to use a separate conda environment with renv installed in order to build and restore the renv.
Hi @fgvieira , @multiple environments:
|
@votti we need a version in the Don't think we need the If not needed, best to have a single env. You can always add |
# resources CLI flag: --resources renv_lock=1 --set-resource-scopes renv_lock=global | ||
|
||
|
||
rule renv_example_init: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar the renv
testcase requires renv
as an additional dependency which I would track in the separate renv.yaml
.
Generally these should not need to be updated. If updating renv.yaml, the renv_dt.lock file needs to be adapted as well.
As discussed in the PR this does not need to be explicitly shown. Instead I have added this more expliiclty to the documentation.
f337a87
to
b17f428
Compare
Before this would have raised an error `log` not defined
utils/quarto/meta.yaml
Outdated
threads: lambda wc, threads: threads | ||
mem: lambda wc, resources: resources.mem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense for the wrapper to use threads
and mem
specified under resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think it would make sense and attempted to implement this, but technically I did not find a way to achieve it:
We currently pass parameters for the markdown rendering using the --P, --execute-param KEY:VALUE
syntax of the quarto render
command. This only accepts parameters that are actually defined in the markdown document to be rendered and fails otherwise. This behaviour is coded in the knit_params_get
used under the hood: https://github.com/rstudio/rmarkdown/blob/4b2f3426c1f46a8d207b0d661fc51c2508ecdad7/R/params.R#L28-L34
I dont see any possiblity to easily change this behaviour.
Thus if we would pass threads
and mem
as a parameter automatically, these would require the explicit declaration of these parameters in the markdown rendered.
If the markdown rendered does not explicitly supports these parameters, rendering would fail.
Thus to me requiring all markdowns to be rendered using this wrapper to support these two resource parameters would not be a good solution, as it would limit the applicability of the wrapper.
Also it would not be a complete solution as supporting any other resources would still require a workaround as the one documented here as well.
Automatically passing any resource parameter would be even more brittle as additional resources may be passed depending on the snakemake configuration used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, params
/threads
/mem
need to be specified inside the .qmd
file AND passed as command-line arguments (--P, --execute-param KEY:VALUE
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a bit unpractical since the params of the rule might not match the ones in the .qmd
file! On the other hand, it is also not possible to add more parameters.
How are parameters included in the .qmd
file? Is it an upstream rule?
Would it make sense for the wrapper to overwrite (or check) the params in the .qmd
file to ensure they match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of the qmd
file like an extension to Rmarkdown or akin to a Jupyter notebook.
I imagine the typical use of the wrapper would be users to hand-write the .qmd
notebook, like they would do in a Rmarkdown or Jupyter notebook.
The parameter are included when authoring the file. The defined parameters are then used in the report to eg customize titles or as parameters in functions.
Would it make sense for the wrapper to overwrite (or check) the params in the .qmd file to ensure they match?
@overwriting: I am not sure if this would be sensible: adding additional parameters would not typically not really having an effect. I would say is is really rare for people to write code that handles arbitrary additional parameters and thus typically these would simply be ignored.
@cheking to ensure they match: this is what quarto render
does at runtime and the error message raised is quite understandable, so I would say this is already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if the wrapper just read the parameters in the .qmd
file and add those with --execute-param KEY:VALUE
? Would that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could make sense and it should be 'easy' to implement using knitr:knit_params
: https://rdrr.io/cran/knitr/man/knit_params.html
or by writing a python function that parses the 'front matter' of such documents based some regexp.
This definitely would be quite convenient to implicitly pass these parameters if appropriately named parameters exist in the document.
Personally I am not so sure if I like such an implicit parameter passing as I tend to prefer explicit over implicit for such cases. Implicitly passing these may make it difficult to detect errors. Eg if in the parameter in the quarto document is called 'thread' instead of 'threads' this would lead to a silent ignoring of the parameter.
Also it is not possible to see at a glance from the rule if the document is respecting the resources.
What would you think as an alternative of a parameter: 'params: resource_parameters' that allow the user to specify a list of strings with resources to be passed as parameters:
Eg:
params:
resource_parameters: ['threads', 'mem_mb']
What would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now implemented that if a quarto params named resources_<resource_name>
is defined
the corresponding snakemake resource parameter is automatically passed.
eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be
passed.
threads
will also be passed if a quarto parameter resrouces_threads
is present.
Would this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how would those quarto params be defined?
I am not very familiar with quarto
but my concern is that, if the rule resources change then the same input won't work (since the resources are hard-coded inside the file).
As I see it, the only way to ensure it, is to always overwrite (or add) the resources in the .qmd
file with those in snakemake.resources
. If you think the current wrapper can do it, then I am happy. 😄
Co-authored-by: Filipe G. Vieira <[email protected]>
As suggested resources that match the name should be automatically passed. Tests added: - test if a custom resource added will be passed - test if the special resource 'threads' will be passed
If a quarto params named resources_<resource_name> is defined the corresponding snakemake resource parameter is automatically passed. eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be passed.
Accidentally the header of the notes section in meta.yaml was removed
The idea behind @johanneskoester what do you think of using |
I think it is worth adding any further missing R packages to conda first. While it is fine to support renv as an alternative to conda at some point in general, I would say it is not really needed here, isn't it? We autobump and lock conda packages here, supporting a different kind of environments for wrappers would only further increase the complexity without improving the user experience (since conda is capable to handle R packages just fine). |
I just checked some of the packages in the renv, and all of them are available on conda-forge. I did not check all of them, but maybe they are even all already available for adding them to the environment.yaml? |
Quarto has become a frequently used markdown notebook authoring system.
This wrapper uses quarto to render R Quarto
.qmd
notebooks using the the Quarto params mechanism to pass parameters from snakemake.R dependency handling with
renv
is supported.The
meta.yml
contains a more detailed description of the wrapper.Test cases for all the major functionality and demonstrating the
renv
integration are included.QC
For all wrappers added by this PR,
input:
andoutput:
file paths in the resulting rule can be changed arbitrarily,threads: x
statement withx
being a reasonable default,map_reads
for a step that maps reads),environment.yaml
specifications follow the respective best practices,environment.yaml
pinning has been updated by runningsnakedeploy pin-conda-envs environment.yaml
on a linux machine,input:
oroutput:
),Snakefile
s and their entries are explained via comments (input:
/output:
/params:
etc.),stderr
and/orstdout
are logged correctly (log:
), depending on the wrapped tool,tempfile.gettempdir()
points to (see here; this also means that using any Pythontempfile
default behavior works),meta.yaml
contains a link to the documentation of the respective tool or command,Snakefile
s pass the linting (snakemake --lint
),Snakefile
s are formatted with snakefmt,