Skip to content

Commit

Permalink
Merge branch 'master' of github.com:SurgicalInformatics/healthyr_book
Browse files Browse the repository at this point in the history
# Conflicts:
#	06_working_continuous.Rmd
#	docs/healthyr-book.pdf
  • Loading branch information
ewenharrison committed Jul 22, 2020
2 parents f4a1310 + 1bec03e commit 5d639b4
Show file tree
Hide file tree
Showing 12 changed files with 303 additions and 155 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ docs/*
!docs/healthyr-book.pdf
rsconnect
healthyr-book.rds
my_saved_plot.png
my_saved_plot*
render*.rds
tmp-pdfcrop*
*.log
2 changes: 1 addition & 1 deletion 01_introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ You can do this in the **Packages** tab (next to the Plots tab in the bottom-rig

A Package is just a collection of functions (commands) that are not included in the standard R installation, called base-R.

A lot of the functionality introduced in this book comes from the __tidyverse__ family of R packages (http://tidyverse.org).
A lot of the functionality introduced in this book comes from the __tidyverse__ family of R packages (http://tidyverse.org @tidyverse2019).
So when you go to Packages, click **Install**, type in __tidyverse__, and a whole collection of useful and modern packages will be installed.

Even though you've installed the __tidyverse__ packages, you'll still need to tell R when you're about to use them.
Expand Down
45 changes: 28 additions & 17 deletions 02_basics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,14 @@ editor_options:
knitr::opts_chunk$set(fig.align = 'center')
library(tidyverse)
library(kableExtra)
mykable = function(x, caption = "CAPTION", ...){
kable(x, row.names = FALSE, align = c("l", "l", "r", "r", "r", "r", "r", "r", "r"),
booktabs = TRUE, caption = caption,
linesep = "", ...)
}
```


Throughout this book, we are conscious of the balance between theory and practice.
Some learners may prefer to see all definitions laid out before being shown an example of a new concept.
Others, would rather see practical examples and explanations build up to a full understanding over time.
Expand Down Expand Up @@ -437,22 +442,27 @@ We need to convert `my_datesdiff` (which is a difftime value) into a numeric val
560/as.numeric(my_datesdiff)
```

The lubridate package comes with several convenient functions for parsing dates, e.g., `ymd()`, `mdy()`, `ymd_hm()`, etc. - for a full list see [lubridate.tidyverse.org](lubridate.tidyverse.org).
The __lubridate__ package comes with several convenient functions for parsing dates, e.g., `ymd()`, `mdy()`, `ymd_hm()`, etc. - for a full list see [lubridate.tidyverse.org](lubridate.tidyverse.org).

However, if your date/time variable comes in an extra special format, then use the `parse_date_time()` function where the second argument specifies the format using these helpers:
However, if your date/time variable comes in an extra special format, then use the `parse_date_time()` function where the second argument specifies the format using the specifiers given in Table \@ref(tab:chap2-tab-timehelpers).

| Notation | Meaning | Example |
|----------|--------- |---------|
|`%d` |day as number |01-31|
|`%m` |month as number |01-12|
|`%B` |month name |January-December|
|`%b` |abbreviated month |Jan-Dec|
|`%Y` |4-digit year |2019|
|`%y` |2-digit year |19|
|`%H` |hours |12|
|`%M` |minutes |01|
|`%A` |weekday |Monday-Sunday|
|`%a` |abbreviated weekday |Mon-Sun|
```{r chap2-tab-timehelpers, echo = FALSE}
tribble(
~Notation, ~Meaning, ~Example,
"%d", "day as number" ,"01-31",
"%m", "month as number" ,"01-12",
"%B", "month name" ,"January-December",
"%b", "abbreviated month" ,"Jan-Dec",
"%Y", "4-digit year" ,"2019",
"%y", "2-digit year" ,"19",
"%H", "hours" ,"12",
"%M", "minutes" ,"01",
"%S", "seconds" ,"59",
"%A", "weekday" ,"Monday-Sunday",
"%a", "abbreviated weekday" ,"Mon-Sun") %>%
mykable(caption = "Date/time format specifiers.") %>%
kableExtra::kable_styling(font_size=9, latex_options = "hold_position")
```


For example:
Expand All @@ -461,7 +471,7 @@ For example:
parse_date_time("12:34 07/Jan'20", "%H:%M %d/%b'%y")
```

Furthermore, the same date/time helpers can be used to rearrange your date and time for printing:
Furthermore, the same date/time specifiers can be used to rearrange your date and time for printing:

```{r}
Sys.time()
Expand All @@ -471,6 +481,7 @@ You can even add plain text into the `format()` function, R will know to put the

```{r}
Sys.time() %>% format("Happy days, the current time is %H:%M %B-%d (%Y)!")
Sys.time() %>% format("Happy days, the current time is %H:%M %B-%d (%Y)!")
```

## Objects and functions {#chap02-objects-functions}
Expand All @@ -494,7 +505,7 @@ mydata <- tibble(
mydata %>%
knitr::kable(booktabs = TRUE, caption = "Example of data in columns and rows, including missing values denoted `NA` (Not applicable/Not available). Once this dataset has been read into R it gets called dataframe/tibble.") %>%
kableExtra::kable_styling(font_size=8)
kableExtra::kable_styling(font_size=9)
```

\FloatBarrier
Expand Down
47 changes: 26 additions & 21 deletions 04_plotting.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -378,9 +378,9 @@ Alpha is an aesthetic to make geoms transparent, its values can range from 0 (in
\index{plots@\textbf{plots}!path}
\index{plots@\textbf{plots}!time-series}
Let's plot the life expectancy in the United Kingdom over time:
Let's plot the life expectancy in the United Kingdom over time (Figure \@ref(fig:chap04-fig-lineplot)):
```{r, fig.width = 4, fig.height = 1.5}
```{r chap04-fig-lineplot, fig.width = 4, fig.height = 1.5, fig.cap="`geom_line()`- Life expectancy in the United Kingdom over time."}
gapdata %>%
filter(country == "United Kingdom") %>%
ggplot(aes(x = year, y = lifeExp)) +
Expand Down Expand Up @@ -438,9 +438,9 @@ This code works as expected (Figure \@ref(fig:chap04-fig-zigzag) (2)) - yes ther
### Exercise {#chap04-ex-lineplot}
Follow the step-by-step instructions to transform (Figure \@ref(fig:chap04-fig-zigzag):2) into this:
Follow the step-by-step instructions to transform Figure \@ref(fig:chap04-fig-zigzag)(2) into \@ref(fig:chap04-fig-lineplot2).
```{r, fig.width=0.8*10, echo = FALSE, fig.height=0.8*4}
```{r chap04-fig-lineplot2, fig.width=0.8*10, echo = FALSE, fig.height=0.8*4, fig.cap = "Lineplot exercise."}
gapdata %>%
ggplot(aes(x = year, y = lifeExp, group = country, colour=continent)) +
geom_line() +
Expand Down Expand Up @@ -585,9 +585,10 @@ Therefore, you can find Hex colour codes from a lot of places on the internet, o

Whether using `geom_bar()` or `geom_col()`, we can use fill to display proportions within bars.
Furthermore, sometimes it's useful to set the x value to a constant - to get everything plotted together rather than separated by a variable.
So we are using `aes(x = "Global", fill = continent)`, note that "Global" could be any word - since it's quoted `ggplot()` won't go looking for it in the dataset:
So we are using `aes(x = "Global", fill = continent)`.
Note that "Global" could be any word - since it's quoted `ggplot()` won't go looking for it in the dataset (FIgure \@ref(fig:chap04-fig-proportions)):

```{r}
```{r chap04-fig-proportions, fig.cap = "Number of countries in the gapminder datatset with proportions using the `fill = continent` aesthetic."}
gapdata2007 %>%
ggplot(aes(x = "Global", fill = continent)) +
geom_bar()
Expand Down Expand Up @@ -623,29 +624,29 @@ Hints:
\index{plots@\textbf{plots}!histogram}

A histogram displays the distribution of values within a continuous variable.
In the example below, we are taking the life expectancy (`aes(x = lifeExp)`) and telling the histogram to count the observations up in "bins" of 10 years (`geom_histogram(binwidth = 10)`):
In the example below, we are taking the life expectancy (`aes(x = lifeExp)`) and telling the histogram to count the observations up in "bins" of 10 years (`geom_histogram(binwidth = 10)`, Figure \@ref(fig:chap04-fig-hist)):

```{r include=FALSE}
# don't understand what keeps resetting it! patchwork?
theme_set(theme_bw())
```


```{r fig.width=4}
```{r chap04-fig-hist, fig.width=4, fig.cap = "`geom_histogram()` - The distribution of life expectancies in different countries around the world in year 2007."}
gapdata2007 %>%
ggplot(aes(x = lifeExp)) +
geom_histogram(binwidth = 10)
```
We can see that most countries in the world have a life expectancy of ~70-80 years (in 2007), and that the distribution of life expectancies globally is not normally distributed.
Setting the binwidth is optional, using just `geom_histogram()` works well too -by default, it will divide your data into 30 bins.
Setting the binwidth is optional, using just `geom_histogram()` works well too - by default, it will divide your data into 30 bins.
There are more examples of histograms in Chapter \@ref(chap06-h1). There are two other geoms that are useful for plotting distributions: `geom_density()` and `geom_freqpoly()`.
## Box plots
\index{plots@\textbf{plots}!boxplot}
Box plots are our go to method for quickly visualising summary statistics of a continuous outcome variable (such as life expectancy in the gapminder dataset).
Box plots are our go to method for quickly visualising summary statistics of a continuous outcome variable (such as life expectancy in the gapminder dataset, Figure \@ref(fig:chap04-fig-boxplot)).
Box plots include:
Expand All @@ -654,7 +655,7 @@ Box plots include:
* whiskers (the black lines extending to the lowest and highest values that are still within 1.5*IQR)
* outliers (any observations out with the whiskers)
```{r, fig.width=3, fig.height=2.75}
```{r chap04-fig-boxplot, fig.width=3, fig.height=2.75, fig.cap = "`geom_boxplot()` - Boxplots of life expectancies within each continent in year 2007."}
gapdata2007 %>%
ggplot(aes(x = continent, y = lifeExp)) +
geom_boxplot()
Expand All @@ -664,7 +665,7 @@ gapdata2007 %>%

One of the coolest things about `ggplot()` is that we can plot multiple geoms on top of each other!

Let' add individual data points on top of the box plots:
Let's add individual data points on top of the box plots:

```{r include=FALSE}
# don't understand what keeps resetting it! patchwork?
Expand All @@ -678,7 +679,7 @@ gapdata2007 %>%
geom_point()
```

This makes Figure \@ref(fig:chap04-fig-multigeoms):1.
This makes Figure \@ref(fig:chap04-fig-multigeoms)(1).

```{r chap04-fig-multigeoms, echo=FALSE, fig.width=0.8*10, fig.height=0.8*8, fig.cap = "Multiple geoms together. (1) `geom_boxplot() + geom_point()`, (2) `geom_boxplot() + geom_jitter()`, (3) colour aesthetic inside `ggplot(aes())`, (4) colour aesthetic inside `geom_jitter(aes())`."}

Expand Down Expand Up @@ -733,8 +734,9 @@ This is new: `aes()` inside a geom, not just at the top!
In the code for (4) you can see `aes()` in two places - at the top and inside the `geom_jitter()`.
And `colour = continent` was only included in the second `aes()`.
This means that the jittered points get a colour, but the box plots will be drawn without (so just black).
This is exactly* what we see on \@ref(fig:chap04-fig-multigeoms).

This is exactly what we see on \@ref(fig:chap04-fig-multigeoms)^[Nerd alert: the variation added by `geom_jitter()` is random, which means that when you recreate the same plots the points will appear in slightly different locations to ours. To make identical ones, add `position = position_jitter(seed = 1)` inside `geom_jitter()`.].
*Nerd alert: the variation added by `geom_jitter()` is random, which means that when you recreate the same plots the points will appear in slightly different locations to ours. To make identical ones, add `position = position_jitter(seed = 1)` inside `geom_jitter()`.

### Worked example - three geoms together

Expand All @@ -753,16 +755,19 @@ label_data <- gapdata2007 %>%
label_data
```




The first two geoms are from the previous example (`geom_boxplot()` and `geom_jitter()`).
Note that `ggplot()` plots them in the order they are in the code - so box plots at the bottom, jittered points on the top.
We are then adding `geom_label()` with its own data option (`data = label_data`) as well as a new aesthetic (`aes(label = country)`):
We are then adding `geom_label()` with its own data option (`data = label_data`) as well as a new aesthetic (`aes(label = country)`, Figure \@ref(fig:chap04-fig-labels)):

```{r include=FALSE}
```{r include=FALSE}
# don't understand what keeps resetting it! patchwork?
theme_set(theme_bw())
```

```{r, fig.width=5, fig.height=4}
```{r chap04-fig-labels, fig.width=5, fig.height=4, fig.cap = "Three geoms together on a single plot: `geom_boxplot()`, `geom_jitter()`, and `geom_label()`."}
gapdata2007 %>%
ggplot(aes(x = continent, y = lifeExp)) +
# First geom - boxplot
Expand Down Expand Up @@ -834,11 +839,11 @@ gapminder %>%
There are two examples of how just a few lines of `ggplot()` code and the basic geoms introduced in this chapter can be used to make very different things.
Let your imagination fly free when using `ggplot()`!

The first one shows how the life expectancies in European countries have increased by plotting a square (`geom_point(shape = 15)`) for each observation (year) in the dataset.
Figure \@ref(fig:chap04-fig-adv1) shows how the life expectancies in European countries have increased by plotting a square (`geom_point(shape = 15)`) for each observation (year) in the dataset.

<!-- Explain `fun=max` below? -->

```{r, fig.width=7, fig.height=5}
```{r chap04-fig-adv1, fig.width=7, fig.height=5, fig.cap = "Increase in European life expectencies over time. Using `fct_reorder()` to order the countries on the y-axis by life expectancy (rather than alphabetically which is the default)."}
gapdata %>%
filter(continent == "Europe") %>%
ggplot(aes(y = fct_reorder(country, lifeExp, .fun=max),
Expand All @@ -849,10 +854,10 @@ gapdata %>%
theme_bw()
```
In the second example, we're using `group_by(continent)` followed by `mutate(country_number = seq_along(country))` to create a new column with numbers 1, 2, 3, etc for countries within continents.
In Figure \@ref(fig:chap04-fig-adv2), we're using `group_by(continent)` followed by `mutate(country_number = seq_along(country))` to create a new column with numbers 1, 2, 3, etc for countries within continents.
We are then using these as `y` coordinates for the text labels (`geom_text(aes(y = country_number...`).
```{r, fig.height=10, fig.width=8}
```{r chap04-fig-adv2, fig.height=10, fig.width=8, fig.cap = "List of countries on each continent as in the gapminder dataset."}
gapdata2007 %>%
group_by(continent) %>%
mutate(country_number = seq_along(country)) %>%
Expand Down
Loading

0 comments on commit 5d639b4

Please sign in to comment.