Skip to content

Commit

Permalink
Merge branch 'master' of github.com:datacarpentry/R-ecology-lesson
Browse files Browse the repository at this point in the history
  • Loading branch information
fmichonneau committed Jul 1, 2019
2 parents f947d84 + e8826b1 commit 1eaef4b
Show file tree
Hide file tree
Showing 5 changed files with 95 additions and 60 deletions.
2 changes: 1 addition & 1 deletion 03-dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Then, to load the package type:

```{r, message = FALSE, purl = FALSE}
## load the tidyverse packages, incl. dplyr
library("tidyverse")
library(tidyverse)
```

## What are **`dplyr`** and **`tidyr`**?
Expand Down
10 changes: 5 additions & 5 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Ethan White <[email protected]>
Francisco Rodriguez-Sanchez <[email protected]>
Francois Michonneau <[email protected]>
Fred Boehm <[email protected]>
GMoncrieff <[email protected]>
Glenn Moncrieff <[email protected]>
Hao Ye <[email protected]>
Harriet Dashnow <[email protected]>
Hilmar Lapp <[email protected]>
Expand All @@ -41,7 +41,7 @@ Jarrett Byrnes <[email protected]>
Jeffrey W Hollister <[email protected]>
Jieming Chen <[email protected]>
Jillian Dunic <[email protected]>
Jon <[email protected]>
Jon Petters <[email protected]>
Jonathan Keane <[email protected]>
Joseph Stachelek <[email protected]>
Josh Herr <[email protected]>
Expand Down Expand Up @@ -92,9 +92,9 @@ Will Furnass <[email protected]>
Will Pearse <[email protected]>
Ye Li <[email protected]>
Zena Lapp <[email protected]>
ab604 <[email protected]>
ashander <[email protected]>
cengel <[email protected]>
Alistair Bailey <[email protected]>
Jaime Ashander <[email protected]>
Claudia Engel <[email protected]>
Brian Seok <[email protected]>
sfn_brt <[email protected]>
suparee <[email protected]>
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,3 @@ maintainers, or come chat with us on the [Slack Channel for this lesson](https:/
* Auriel Fournier
* François Michonneau
* Brian Seok
* Shiva Guru
57 changes: 34 additions & 23 deletions instructor-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ root: .

## Dataset

The data used for this lesson are in the figshare repository at: https://doi.org/10.6084/m9.figshare.1314459
The data used for this lesson are in the figshare repository at:
https://doi.org/10.6084/m9.figshare.1314459

This lesson uses mostly `combined.csv`. The 3 other csv files: `plots.csv`,
`species.csv` and `surveys.csv` are only needed for the lesson on databases.
Expand Down Expand Up @@ -39,23 +40,25 @@ this file, so the participants can follow along.

Some learners may have previous R installations. On Mac, if a new install
is performed, the learner's system will create a symbolic link, pointing to the
new install as 'Current.' Sometimes this process does not occur, and, even though
a new R is installed and can be accessed via the R console, RStudio does not find it.
The net result of this is that the learner's RStudio will be running an older R install.
This will cause package installations to fail. This can be fixed at the terminal. First,
check for the appropriate R installation in the library;
new install as 'Current.' Sometimes this process does not occur, and, even
though a new R is installed and can be accessed via the R console, RStudio does
not find it. The net result of this is that the learner's RStudio will be
running an older R install. This will cause package installations to fail. This
can be fixed at the terminal. First, check for the appropriate R installation in
the library;

```
ls -l /Library/Frameworks/R.framework/Versions/
```

We are currently using R 3.4.x. If it isn't there, they will need to install it. If it
is present, you will need to set the symbolic link to Current to point to the 3.4.x
directory:
We are currently using R 3.6.x. If it isn't there, they will need to install it.
If it is present, you will need to set the symbolic link to Current to point to
the 3.6.x directory:

```
ln -s /Library/Frameworks/R.framework/Versions/3.4.x /Library/Frameworks/R.framework/Version/Current
ln -s /Library/Frameworks/R.framework/Versions/3.6.x /Library/Frameworks/R.framework/Version/Current
```

Then restart RStudio.

## Narrative
Expand All @@ -81,7 +84,6 @@ Then restart RStudio.
point about how workshops are a great way to create community of learners that
can help each others during and after the workshop.


### Intro to R

* When going over the section on assignments, make
Expand All @@ -102,23 +104,22 @@ The two main goals for this lessons are:
exposed to it. The content of the lesson should be enough for learners to
avoid common mistakes with them.

### Manipulating data with dplyr
### Manipulating data

* For this lesson make sure that learners are comfortable using pipes.
* There is also sometimes some confusion on what the arguments of `group_by`
should be.

### Using tidyr to reshape data for plotting
* This lesson uses the tidyr package to reshape data for plotting
* After this lesson students should be familiar with the spread() and gather() functions available in tidyr
* After this lesson students should be familiar with the spread() and gather()
functions available in tidyr

### Visualizing data with ggplot2
### Visualizing data

* This lesson is a broad overview of ggplot2 and focuses on (1) getting familiar
with the layering system of ggplot2, (2) using the argument `group` in the
`aes()` function, (3) basic customization of the plots.

### Using databases from R
### R and SQL

* Ideally this lesson is best taught at the end of the workshop (as a capstone
example) to illustrate how the tools covered can integrate with each
Expand Down Expand Up @@ -149,15 +150,25 @@ Alternatively you can go to CRAN and download the package and install from ZIP
file
- Tools > Install Packages > set to 'from Zip/TAR'

It is important that R, and the R packages be installed locally, not on a network drive. If a learner is using a machine with multiple users where their account is not based locally this can create a variety of issues (This often happens on university computers). Hopefully the learner will realize these issues before hand, but depending on the machine and how the IT folks that service the computer have things set up, it may be very difficult to impossible to make R work without their help.
It is important that R, and the R packages be installed locally, not on a
network drive. If a learner is using a machine with multiple users where their
account is not based locally this can create a variety of issues (This often
happens on university computers). Hopefully the learner will realize these
issues before hand, but depending on the machine and how the IT folks that
service the computer have things set up, it may be very difficult to impossible
to make R work without their help.

If learners are having issues with one package, they may have issues with another. Its often easier to make sure they have all the needed packages installed at one time, rather then deal with these issues over and over. [Here is a list of all necessary packages for these lessons.](https://github.com/datacarpentry/R-ecology-lesson/blob/master/needed_packages.R)
If learners are having issues with one package, they may have issues with
another. Its often easier to make sure they have all the needed packages
installed at one time, rather then deal with these issues over and over.
[Here is a list of all necessary packages for these lessons.](https://github.com/datacarpentry/R-ecology-lesson/blob/master/needed_packages.R)

## Other Resources

If you encounter a problem during a workshop, feel free to contact the
maintainers by email
or
If you encounter a problem during a workshop, feel free to contact the
maintainers by email or
[open an issue](https://github.com/datacarpentry/R-ecology-lesson/issues/new).

For a more in-depth coverage of topics of the workshops, you may want to read "[R for Data Science](http://r4ds.had.co.nz/)" by Hadley Wickham and Garrett Grolemund.
For a more in-depth coverage of topics of the workshops, you may want to read
"[R for Data Science](http://r4ds.had.co.nz/)" by Hadley Wickham and Garrett
Grolemund.
85 changes: 55 additions & 30 deletions reference.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
Cheat sheet of functions used in the lessons


## Lesson 1 -- Introduction to R

* `sqrt()` # calculate the square root
* `round()` # round a number
* `args()` # find what arguments a function takes
* `length()` # how many elements are in a particular vector
* `class() ` # the class (the type of element) of an object
* `str() ` # an overview of the object and the elements it contains
Expand All @@ -15,57 +17,80 @@ Cheat sheet of functions used in the lessons

## Lesson 2 -- Starting with data

* `download.file() ` # download files from the internet to your computer
* `read.csv() ` # load CSV file into R memory
* `head() ` # check the top (the first 6 lines) of an object including data frames
* `factor() ` # create factors
* `levels() ` # check levels of a factor
* `nlevels() ` # check number of levels of a factor
* `as.numeric(levels(x))[x] ` # convert factors where the levels appear as numbers to a numeric vector

## Lesson 3 -- Introducing data.frame

* `data.frame()` # create a data frame
* `download.file() ` # download files from the internet to your computer
* `read.csv() ` # load CSV file into R memory
* `head() ` # shows the first 6 rows
* `View()` # invoke a spreadsheet-style data viewer
* `read.table()` # load a file in table format into R memory
* `str() ` # check structure of the object and information about the class, length and content of each column
* `dim() ` # check dimension of data frame
* `nrow() ` # returns the number of rows
* `ncol() ` # returns the number of columns
* `head() ` # shows the first 6 rows
* `tail() ` # shows the last 6 rows
* `names() ` # returns the column names (synonym of colnames() for data frame objects)
* `rownames() ` # returns the row names
* `str() ` # check structure of the object and information about the class, length and content of each column
* `summary() ` # summary statistics for each column
* `seq() ` # generates a sequence of numbers
* `factor() ` # create factors
* `levels() ` # check levels of a factor
* `nlevels() ` # check number of levels of a factor
* `as.character()` # convert an object to a character vector
* `as.numeric()` # convert an object to a numeric vector
* `as.numeric(as.character(x))` # convert factors where the levels appear as characters to a numeric vector
* `as.numeric(levels(x))[x]` # convert factors where the levels appear as numbers to a numeric vector
* `plot()` # plot an object
* `data.frame()` # create a data.frame object
* `ymd()` # convert a vector representing year, month, and day to a Date vector
* `paste()` # concatenate vectors after converting to character

## Lesson 4 -- Aggregating and analyzing data with dplyr
## Lesson 3 -- Manipulating, analyzing and exporting data with tidyverse

* `install.packages()` # install a CRAN package in R
* `library() ` # load installed package into the current session
* `read_csv()` # load a csv formatted file into R memory
* `str()` # check structure of the object and information about the class, length and content of each column
* `View()` # invoke a spreadsheet-style data viewer
* `select() ` # select columns of a data frame
* `filter() ` # allows you to select a subset of rows in a data frame
* `%>% ` # pipes to select and filter at the same time
* `mutate() ` # create new columns based on the values in existing columns
* `head() ` # shows the first 6 rows
* `group_by() ` # split the data into groups, apply some analysis to each group, and then combine the results.
* `summarize() ` # collapses each group into a single-row summary of that group
* `tally()` # counts the total number of records for each category.
* `write.csv() ` # save CSV file
* `mean()` # calculate the mean value of a vector
* `!is.na()` # test if there are no missing values
* `print()` # print values to the console
* `min()` # return the minimum value of a vector
* `arrange()` # arrange rows by variables
* `desc()` # transform a vector into a format that will be sorted in descending order
* `count()` # counts the total number of records for each category
* `spread()` # reshape a data frame by a key-value pair across multiple columns
* `gather()` # reshape a data frame by collapsing into a key-value pair
* `n_distinct()` # get a count of unique values
* `write_csv()` # save to a csv formatted file

## Lesson 5 -- Data visualization with ggplot2
## Lesson 4 -- Data visualization with ggplot2

* `ggplot2(data= , aes(x= , y= )) + geom_point( ) + facet_wrap () +
theme_bw() + theme() `
* `read_csv()` # load a csv formatted file into R memory
* `ggplot2(data= , aes(x= , y= )) + geom_point( ) + facet_wrap () + theme_bw() + theme() `
* `aes()` # by selecting the variables to be plotted and the variables to
define the presentation such as plotting size, shape color, etc.
* `geom_` # graphical representation of the data in the plot (points, lines, bars). To add a geom to the plot use + operator
* `facet_wrap()` # allows to split one plot into multiple plots based on a factor included in the dataset
* `labs()` # set labels to plot
* `theme_bw()` # set the background to white
* `theme()` # used to locally modify one or more theme elements in a specific ggplot object
*
## Lesson 6 -- R and SQL
* `grid.arrange()` # combine and arrange multiple ggplots into a single figure
* `ggsave()` # save a ggplot

## Lesson 5 -- SQL databases and R

* `src_sqlite` # connect dplyr to a SQLite database file
* `dir.create()` # create a directory
* `download.file() ` # download files from the internet to your computer
* `dbConnect()` # create a connection to a database
* `SQLite()` # connect to a SQLite database
* `src_dbi()` # connect dplyr to a DBI-compatible database file
* `tbl` # connect to a table within a database
* `collect` # retrieve all the results from the database
* `explain` # show the SQL translation of a dplyr query
* `inner_join` # perform an inner join between two tables
* `copy_to` # copy a data frame as a table into a database
* `sql()` # combine character vectors into a single SQL expression
* `show_query()` # show which SQL commands are sent to the database
* `collect()` # retrieve all the results from the database
* `inner_join()` # perform an inner join between two tables
* `src_sqlite()` # connect dplyr to a SQLite database file
* `copy_to()` # copy a data frame as a table into a database

0 comments on commit 1eaef4b

Please sign in to comment.