-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathautomata.qmd
374 lines (259 loc) · 13.8 KB
/
automata.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
---
title: "Workflow Automation"
---
## Loops
Often we want to perform some set of operations repeatedly across a known number of iterations. For example, maybe we want to subset a given data file into a separate [variable]{.py}/[object]{.r} by month of data collection and export the resulting file as a CSV. We _could_ simply copy/paste our 'subset and export' code as many times as needed but this can be error-prone. Also, it is cumbersome to manually update all copies of the relevant code when you identify a possible improvement.
One code solution to this is to **automate** the workflow using `for` loops (casually referred to more simply as just "loops"). The syntax of [{{< fa brands python >}}]{.py} [Python]{.py} and [{{< fa brands r-project >}}]{.r} [R]{.r} is very similar for loops--likely because this is such a fundamental operation to any coding language!
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Make a simple object to demonstrate loops.
```{r r-loop-prep}
# Make a vector of animal types
zoo_r <- c("lion", "tiger", "crocodile", "vulture", "hippo")
# Check that out
zoo_r
```
## [{{< fa brands python >}} Python]{.py}
Make a simple variable to demonstrate loops.
```{python py-loop-prep}
# Make a list of animal types
zoo_py = ["lion", "tiger", "crocodile", "vulture", "hippo"]
# Check that out
zoo_py
```
:::
With this simple [variable]{.py}/[object]{.r} in-hand we can now demonstrate the core facets of loops.
### Fundamental Components
Loops (in either language) require a few core components in order to work properly:
1. `for` statement -- defines the start of the loop-definition component
2. "Loop [variable]{.py}/[object]{.r}" -- essentially a placeholder [variable]{.py}/[object]{.r} whose value will change with each iteration of the loop
3. `in` statement -- relates loop [variable]{.py}/[object]{.r} to set of [list]{.py}/[vector]{.r} to iterate across
4. [list]{.py}/[vector]{.r} to iterate across -- set of values to iterate across
5. Actual workflow! -- operations to perform on each iteration of the loop
To see in this syntax in action we'll use a simple loop that prints each animal type in the [list]{.py}/[vector]{.r} we created above.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
In [{{< fa brands r-project >}}]{.r} [R]{.r}, the `for` statement requires parentheses around the loop object, the `in` statement, and the vector to iterate across. The operation(s) performed in each iteration _must_ be wrapped in curly braces (`{...}`).
When the code reaches the closing curly brace it returns to the top of the workflow and begins again with the next element of the provided vector.
```{r r-loop1}
# For each animal in the zoo
for(animal in zoo_r){
# Print its name
print(animal)
}
```
Note that when we are done the loop object still exists and is set to the last element of the vector we iterated across.
```{r r-loop2}
# Check current value of `animal` object
animal
```
## [{{< fa brands python >}} Python]{.py}
In [{{< fa brands python >}}]{.py} [Python]{.py}, the `for` statement, loop variable, `in` statement, and list to iterate across do not use parentheses but the end of the line requires a colon `:`. The operation(s) performed in each iteration _must_ be indentened one level (i.e., press "tab" once or "space" four times).
When the code reaches the end of the indented lines it returns to the top of the workflow and begins again with the next item of the provided list.
```{python py-loop1}
# For each animal in the zoo
for animal in zoo_py:
# Print its name
print(animal)
```
Note that when we are done the loop variable still exists and is set to the last item of the list we iterated across.
```{python py-loop2}
# Check current value of `animal` variable
animal
```
:::
### Loops & Conditionals
We can also build conditional statements into a loop to create a loop that can flexibly handle different outcomes. We have discussed conditional operators elsewhere so we'll only explain the parts of loop conditionals that we haven't already discussed. To demonstrate, we can loop across a set of numbers and use conditionals to print whether the values are greater/less than or equal to zero.
In the example below we'll use three new statements `if`, `else if` and `else`. Each condition only performs its operation when its condition is met (i.e., returns [True]{.py}/[TRUE]{.r}).
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
These three statements all have similar syntax to the `for` statement in that they evaluate something in parentheses and then perform some operation(s) in curly braces. They do differ slightly in context however:
- `if` can only be used first (or in cases where there is only `if` and `else`)
- `else if` can only be used after `if` (or after another `else if`) and _allows for specifying another condition._
- `else` can only be used at the end; catches only cases that _don't_ meet one of the prior conditions
```{r r-condloop}
# Loop across numbers
for(j in c(-2, -1, 0, 1, 2)){
# If less than 0
if(j < 0){
print(paste(j, "is negative"))
}
# If greater than 0
else if(j > 0){
print(paste(j, "is positive"))
}
# If neither of those, then it must be 0!
else {
print(paste(j, "is zero!"))
}
}
```
Note that to get the message to print correctly we needed to wrap a `paste` function in `print` to assemble multiple things into a single object.
## [{{< fa brands python >}} Python]{.py}
These three statements all have similar syntax to the `for` statement in that they evaluate something before a colon and then perform some operation(s) after that colon. They do differ slightly in context however:
- `if` can only be used first (or in cases where there is only `if` and `else`)
- `elif` can only be used after `if` (or after another `elif`) and _allows for specifying another condition._
- `else` can only be used at the end; catches only cases that _don't_ meet one of the prior conditions
```{python py-condloop}
# Loop across numbers
for k in [-2, -1, 0, 1, 2]:
# If less than 0
if k < 0:
print(str(k) + " is negative")
# If greater than 0
elif k > 0:
print(str(k) + " is positive")
# If neither of those, then it must be 0!
else:
print(str(k) + " is zero!")
```
Note that to get the message to print correctly we needed to coerce the loop variable into type string (using the `str` function).
:::
## "Custom" Functions
Loops are a really powerful tool but they are limited in some ways. Sometimes we want to do a task once per project but only use it once in each instance. Such an operation is certainly "repeated" but not really the same context in which a loop makes sense. We can create reusable modular code to fit these circumstances by writing our own custom functions--"custom" in the sense that we write them ourselves rather than load them from a particular library.
Let's write a simple function in both languages that simply multiplies two arguments by one another and returns the result.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Generating a function in [{{< fa brands r-project >}}]{.r} [R]{.r} shares some syntax elements with loops and conditional statements! In this case we use the `function` function to preserve our work as a function, then provide any needed arguments in parentheses, and end with curly braces with the operation(s) performed by the function inside. If the function produces something that we want to give back to the user, we need to specify that with the `return` function.
```{r r-fxn}
# Multiplication function
mult_r <- function(p, q){
# Multiply the two values
result_r <- p * q
# Return that
return(result_r)
}
# Once defined, we can invoke the function like we would any other
mult_r(p = 2, q = 5)
```
## [{{< fa brands python >}} Python]{.py}
Generating a function in [{{< fa brands python >}}]{.py} [Python]{.py} shares some syntax elements with loops and conditional statements! In this case we use the `def` statement then provide the name and--parenthetically--any needed arguments for our new function. If the function produces something that we want to give back to the user, we need to specify that by using the `return` statement.
```{python py-fxn}
# Multiplication function
def mult_py(n, i):
# Add docstrings for later use (see below)
"""
Multiply two values by one another.
n -- First value to multiply
i -- Second value to multiply
"""
# Multiply the two values
result_py = n * i
# Return them
return result_py
# Once defined, we can invoke the function like we would any other
mult_py(n = 2, i = 5)
```
:::
### Function Documentation
One component of custom functions to be aware of is their somewhat variable documentation. "Official" functions tend to be really well documented but custom functions have no required documentation. However, there are some best practices that we can try to follow ourselves to make life as easy as possible for people trying to intuit our functions' purposes (including ourselves in the future!).
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
R contains no native mode of specifying function documentation! While there are tools to formalize this when functions are part of a formal package (see [roxygen2 formatting](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html)) our custom functions _cannot_ include documentation. That said, <u>it is still good practice to include plain-language comment lines that describe the function's operations</u> even when they will only be visible where the function is defined.
Note that the [`docstring` package](https://cran.r-project.org/web/packages/docstring/vignettes/docstring_intro.html) for [{{< fa brands r-project >}}]{.r} [R]{.r} simulates [{{< fa brands python >}}]{.py} [Python]{.py}-style docstrings for [{{< fa brands r-project >}}]{.r} [R]{.r} functions but is not part of "base" R.
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} custom functions allow us to specify triple quoted (`"""..."""`) documentation of function purpose/arguments known as "docstrings". When this is supplied, we can use the `help` function (or append a `?` _after_ the function name) to print whatever documentation was included in the function when it was defined.
```{python py-fxn-help}
# Check custom function documentation
help(mult_py)
```
:::
### Function Defaults
Sometimes a given argument will often be set to the same value. In cases like this, we can define that as the default of the argument which allows users to not specify that argument at all. When users do specify something for that argument, it overrides the default behavior. All functions (and [{{< fa brands python >}}]{.py} [Python methods]{.py}) with "optional" arguments are using defaults behind the scenes to make those arguments optional.
We can define these defaults when we first create a function! Let's make a simple division function that divides the first argument by the second and sets the default of the second argument to 2.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Write and demonstrate the simple division function.
```{r r-fxn-default}
# Define function
div_r <- function(p, q = 2){
# Do division
result_r <- p / q
# Return that
return(result_r)
}
# Test this function
div_r(p = 10)
```
Use the function again but set the second argument ourselves.
```{r r-fxn-nondefault}
# Specify the second argument
div_r(p = 10, q = 10)
```
## [{{< fa brands python >}} Python]{.py}
Write and demonstrate the simple division function.
```{python py-fxn-default}
# Define function
def div_py(n, i = 2):
# Write function documentation
"""
Divide the first value by the second
n -- Numerator
i -- Denominator
"""
# Do division
result_py = n / i
# Return that
return result_py
# Use the function with the default
div_py(n = 10)
```
Use the function again but set the second argument ourselves.
```{python py-fxn-nondefault}
# Specify the second argument
div_py(n = 10, i = 10)
```
:::
### Functions & Conditionals
Just like loops, we can build conditional statements into our functions to make them more flexible and broadly useful. Let's combine this with setting default values to demonstrate this effectively.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Let's make a simple addition function and set both arguments to default to `NULL`. `NULL` is an [{{< fa brands r-project >}}]{.r} [R]{.r} constant that allows us to create an object without assigning any value to it.
Note that we're also using the `is.null` function in our conditional in order to easily assess whether the argument has been left to its default (i.e., set to `NULL`) or defined.
```{r r-condfxn1}
# Define addition function
add_r <- function(p = NULL, q = NULL){
# If first argument is missing, set it to 2
if(is.null(p) == TRUE){
p <- 2
}
# Do the same for the second argument
if(is.null(q) == TRUE){
q <- 2
}
# Sum the two arguments
result_r <- p + q
# Return that
return(result_r)
}
```
Now let's use the function without specifying either argument.
```{r r-condfxn2}
# Use the function
add_r()
```
## [{{< fa brands python >}} Python]{.py}
Let's make a simple addition function and set both arguments to default to `None`. `None` is a [{{< fa brands python >}}]{.py} [Python]{.py} constant that allows us to create a variable without assigning any value to it.
Note that we're also using the `is` statement in our conditional (in this case it is equivalent to `==`).
```{python py-condfxn}
# Define addition function
def add_py(n = None, i = None):
# Add documentation
"""Add two values (`n` and `i`)"""
# If first argument is missing, set it to 2
if n is None:
n = 2
# Do the same for the second argument
if i is None:
i = 2
# Sum the two arguments
result_py = n + i
# Return that
return result_py
```
Now let's use the function without specifying either argument.
```{python py-condfxn2}
# Use the function
add_py()
```
:::