-
-
Notifications
You must be signed in to change notification settings - Fork 1
/
flow.qmd
269 lines (211 loc) · 9.84 KB
/
flow.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
title: "Flow Diagrams"
filters:
- flow.lua
---
## Introduction
This section builds on [The broad workflow](index.html#workflows) and details
the internal process that are invoked with the
[`sandpaper::build_lesson()`]({sadoc}/build_lesson.html) function. If you look at [the source for this
function](https://github.com/carpentries/sandpaper/blob/HEAD/R/build_lesson.R),
it contains a total of sevens significant lines of code (many more due to
documentation and comments).
The pre-flight steps all happen before a single source file is built. These
check for pandoc, validate the lesson, and configure global elements. The last
two lines are responsible for building the site and combining them with the
global variables and templates.
Users will invoke this function in the following ways:
| venue | function | purpose |
| ----- | --------------------------- | -------------------------------------- |
| local | `sandpaper::build_lesson()` | render content for offline use |
| local | `sandpaper::serve()` | dynamically render and preview content |
| remote| `sandpaper:::ci_deploy()` | render content and deploy to branches |
All of these methods will call `sandpaper::validate_lesson()` (which also sets
up global metadata and menu variables) and the two-step internal functions
`sandpaper:::build_markdown()` and `sandpaper:::build_site()`. Below, I break
down and detail the process for each.
## Preflight Checks
Before a lesson can be built, we need to confirm the following:
1. We have access to the tools needed to build a lesson (e.g.
[pandoc](https://pandoc.org)). This is achieved via the
[`sandpaper::check_pandoc()`]({sadoc}/check_pandoc.html)
2. We are inside a lesson that can be built with The Carpentries Workbench
## `validate_lesson()` {#validate-lesson}
The [lesson validator]({sadoc}/validate_lesson.html) is a bit of a misnomer.
Yes, it does peform lesson validation, which it does so through the methods in
the [`pegboard::Lesson` R6 class]({padoc}/Lesson.html).
In order to use thse methods, it first loads the lesson, via the
[`sandpaper::this_lesson()`]({sadoc}/lesson_storage.html) function, which loads
_and_ caches the `pegboard::Lesson` object. It also caches elements that are
mostly duplicated across episodes with small tweaks for each episode:
- metadata in JSON-LD format
- sidebar
- extras menu for learner and instructor views
- translations of menu elements defined in {varnish}
## `build_markdown()` {#build-markdown}
### Generating Markdown {#flow-markdown}
Markdown generation for the lesson is controlled by the internal function
`sandpaper:::build_markdown()`.
When a lesson contains R Markdown files, these need to have content rendered to
markdownsot hat we can further process them. This content is processed with the
[{knitr}](https://yihui.org/knitr/) R package _in a separate R process_.
Markdown source content on the other hand is copied to the `site/built` folder.
Because R Markdown files can take some time to render, we use MD5 sums of the
episode contents (stored in the `site/built/md5sum.txt` file) to skip any files
that have not changed.
```{mermaid}
sequenceDiagram
autonumber
participant episodes/episode.Rmd
box rgb(255, 214, 216) The Workbench
participant {sandpaper}
end
box rgb(230, 234, 240) Document Engine
participant {renv}
participant {knitr}
end
box rgb(255, 231, 168) Generated Files
participant site/built/md5sum.txt
participant site/built/episode.md
end
site/built/md5sum.txt -->> {sandpaper}: READ file cache
{sandpaper} -->> {knitr}: RUN conversion
episodes/episode.Rmd -->> {knitr}: PROCESS changed file(s)
{knitr} -->> site/built/episode.md: WRITE Markdown
{sandpaper} -->> site/built/md5sum.txt: WRITE file cache
```
::: {.callout-note}
#### Package Cache and Reproducibility
One package that is missing from the above diagram is {renv} and that's
partially because it has an indirect effect on the lesson: it provisions the
packages needed to build the lesson.
When episodes are rendered from R Markdown to Markdown, we attempt to reproduce
the build environment as closely as possible by using the {renv} package. If the
global package cache from {renv} is available, then the lesson profile is
activated before the episode is sent to {knitr} and R will use the packages
provided in that profile. This has two distinct advantages:
1. The user does not have to worry about overwriting packages in their own
library (i.e. a graduate researcher working on their dissertation does not
want to have to rewrite their analyses because of a new version of {sf})
2. The package versions will be the same as the versions on the GitHub version
of the site, which means that there will be no false positives of new errors
popping up
For details on the package cache, see the [Building Lessons With A Package
Cache](https://carpentries.github.io/sandpaper/articles/building-with-renv.html)
article.
:::
At this step, the markdown has been written and the state of the cache is
updated so if we re-run this function, then it will show that no changes have
occured. After this step, the internal function `sandpaper:::build_site()` is
run where the markdown file that we just created is converted to HTML with
pandoc and stored in an R object. This R object is then manipulated and then
written to an HTML file with the {varnish} website templates applied.
We use this function in the pull request workflows to demonstrate the changes in
markdown source files, which is useful when package versions change, causing the
output to potentially change.
## `build_site()` {#build-site}
The following sections will discuss the HTML generation ([the following
section](#flow-html)), manipulation ([the section after that](#flow-tweaks)),
and applying the template ([the final section](#flow-website))
separately because, while these processes are each run via the internal
`sandpaper:::build_site()` function, they are functionally separate.
### Generating HTML {#flow-html}
Each markdown file is processed into HTML via [pandoc](https://pandoc.org) and
returned to R as text. This is done via the internal function
`sandpaper:::render_html()`.
```{mermaid}
sequenceDiagram
autonumber
box rgb(255, 214, 216) The Workbench
participant {sandpaper}
end
box rgb(230, 234, 240) Document Engine
participant pandoc
end
box rgb(255, 231, 168) Generated Files
participant site/built/episode.md
end
{sandpaper} -->> pandoc: LOAD pandoc with lua filters
site/built/episode.md -->> pandoc: READ markdown
pandoc -->> {sandpaper}: RENDER HTML as text
```
From here, the HTML exists as the internal body content of a website without a
header, footer, or any styling. It is nearly ready for insertion into a website
template. The next section details the flow we use to tweak the HTML content.
### Processing HTML {#flow-tweaks}
The HTML needs to be tweaked because the output from pandoc, even with our
lua filters, still needs some modification. We tweak the content by first
converting the HTML into an Abstract Syntax Tree (AST). This allows us to
programmatically manipulate tags in the HTML without resorting to using regular
expressions.
In this part, we update links, images, headings, structure that we could not
fix using lua filters. We also apply translations to some of the menu elements
that are not templated in {varnish}. We then use the information from the
episode to complete the global menu variable with links to the second level
headings in the episode.
```{mermaid}
sequenceDiagram
autonumber
box rgb(255, 214, 216) The Workbench
participant {sandpaper}
end
box rgb(230, 241, 255) R Object
participant HTML(AST)
end
box rgb(230, 234, 240) Helper Package
participant {xml2}
end
{sandpaper} -->> {xml2}: READ HTML
{xml2} -->> HTML(AST): PARSE HTML
activate {sandpaper}
note right of HTML(AST): sandpaper:::fix_nodes()
{xml2} -->> HTML(AST): update structure
HTML(AST) -->> {sandpaper}: extract menu items
note right of {sandpaper}: generate learner and instructor versions
deactivate {sandpaper}
```
::: {.callout-note}
#### Working With XML
Working with XML data is perhaps one of the strangest experiences for an R user
because in R, functions will normally return a copy of the data, but when
working with an XML document parsed by {xml2}, the data is modified _in place_.
It allows us to do neat things, but there is a learning curve associated.
I have written hopefully helpful handbooks (guides) on
- [working with XML
data](https://carpentries.github.io/pegboard/articles/intro-xml.html),
- [working with the Lesson
object](https://carpentries.github.io/pegboard/articles/intro-lesson.html)
and
- [working with the Episode
object](https://carpentries.github.io/pegboard/articles/intro-episode.html)
:::
### Applying Website Template {#flow-website}
Now that we have an HTML AST that has been corrected and associated metadata,
we are ready to write this to HTML. This process is achieved by passing the
AST and metadata to {pkgdown} where it performs a little more manipulation,
applies the {varnish} template, and writes it to disk.
```{mermaid}
sequenceDiagram
autonumber
box rgb(255, 214, 216) The Workbench
participant {sandpaper}
participant {varnish}
end
box rgb(230, 241, 255) R Object
participant HTML(AST)
end
box rgb(230, 234, 240) Helper Package
participant {pkgdown}
end
box rgb(255, 231, 168) Generated Files
participant site/docs/episode.html
end
activate {sandpaper}
{sandpaper} -->> {pkgdown}: Set global menu variables
HTML(AST) -->> {pkgdown}: Hand off HTML to pkgdown
deactivate {sandpaper}
activate {pkgdown}
{varnish} -->> {pkgdown}: Load template
{pkgdown} -->> site/docs/episode.html: WRITE website
deactivate {pkgdown}
```