-
Notifications
You must be signed in to change notification settings - Fork 3
/
R0.Rmd
252 lines (182 loc) · 4.83 KB
/
R0.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
title: "R0"
author: "Tyler Richards"
output:
prettydoc::html_pretty:
theme: tactile
highlight: github
---
# Basic Operations and Functions in R
## Why Learn R?
R is a data science specific langauge with a huge community of dedicated and helpful users in every field imaginable. Not only do Google, Microsoft, Uber, TechCrunch, P&G, the New York Times, and Bank of America use R, but also the best academic institutions in the world in every deparment from journalism to finance to political science and biology.
The goal of this workshop, and the next 2 R workshops, is to get you all from 'I have no idea how to do that' to 'yeah I think I could figure out how to do this.'
Before we get to all that, we need to set a good foundation for how R works.
We'll begin by covering data types and classes, and then look into how to create vectors and dataframes, and after that we'll work on how to manipulate your data by filtering and indexing.
## What is R?
R is an open source programming language and software environment for statistical computing and graphics.
R vs RStudio
R is the langauge while RStudio is the integrated development environment(IDE) analagous to Eclipse for python.
Overview of RStudio
* console
* Scripts
* environment
* files, plots, packages
Arguments for an R function come in an order that you must implicitley accept or explicitly define.
R is keeps all your information in objects.
## Common data/object types
```{r}
#Numeric
x <- 4.8
print(class(x))
```
```{r}
#Integer
y <- as.integer(x)
print(y)
```
```{r}
#Character
z <- "dog"
print(class(z))
```
```{r}
#Logical (also called Boolean)
a <- TRUE
print(class(a))
```
```{r}
#factor
heat_setting <- factor(c("Low", "Medium", "High"))
print(class(heat_setting))
```
## Data Structure Types
Major examples include Vectors, Matrices, Arrays, Data Frames, and Lists
### Vectors
```{r}
#Numerical vector
a<-c(1,2,3,4,5,6,7)
print(a)
```
```{r}
#character vector
b<-c("one","two","three")
print(b)
```
```{r}
#Logical vector
c<-c(TRUE,FALSE,FALSE,TRUE)
print(c)
```
```{r}
#Subsetting vectors
a
a[1]
a[2:5]
a[-1]
a[-(1:3)]
a[-length(a)]
```
### Matrix
```{r}
# Matrix with data in it
new_matrix<-matrix(1:20,
nrow=5,
ncol=4) ##byrow=TRUE allows to fill by row
new_matrix
```
```{r}
#Subsetting Matrices
new_matrix[1,4]
new_matrix[5,2]
```
### Data Frames
```{r}
#first create vectors
studentID <- c(1, 2, 3, 4)
age <- c(25, 19, 22, 21)
university <- c("UF", "FSU", "UF", "UGA")
hired <- c(TRUE, FALSE, TRUE, FALSE)
#then combine them into a dataframe!
studentdata <- data.frame(studentID, age, university, hired)
studentdata
```
```{r}
#Subsetting dataframes
studentdata[1,3]
studentdata[1,"hired"]
studentdata$age
```
### Logicals
Logicals in R deal with ``TRUE`` and ``FALSE`` operations. These are also known as boolean operaters in many programming languages.
We obtain logical expressions by comparing values using logical operaters.
Logical operaters in R include the following:
* equal: ``==``
* not equal: ``!=``
* greater/less than: ``> <``
* greater/less than or equal: ``>= <=``
Examples:
```{r}
3<4
4>3*2
3==3*1
```
### Introduction to Functions
```{r}
sum.of.squares <- function(x,y) {
x^2 + y^2
}
sum.of.squares(2,3)
fahrenheit_to_kelvin <- function(temp_F) {
temp_K <- ((temp_F - 32) * (5 / 9)) + 273.15
return(temp_K)
}
fahrenheit_to_kelvin(32)
```
## Basic Functions for working with data
We will use the "mtcars" dataset to apply some of these ideas. This dataset is preinstalled with R to be used pretty much for the purpose of trying things out.
The **head()** function shows you the top six rows while **tail()** shows the last six rows.
Another useful function is the **str()** which gives a description of the structure of the data.
```{r}
class(airquality)
str(airquality)
head(airquality)
tail(airquality)
```
## Dplyr
when you get new data, about 0% of the time it will be in the right format and immediately useful for modeling or analysis.
Dplyr is the grammar of data manipulation. Let's check out how it works.
```{r}
#install.packages("dplyr")
library(dplyr)
```
Dplyr runs off of a series of verbs. We'll go through a few now.
### Filter
```{r}
head(filter(airquality, Wind > 5))
```
```{r}
head(filter(airquality, Temp > 80 & Month > 5))
```
### Mutate
```{r}
head(mutate(airquality, TempInC = (Temp - 32) * 5 / 9))
```
### Summarise
```{r}
summarise(airquality, mean(Temp, na.rm = TRUE))
```
What if we want to perform multiple 'verbs' on our data at once?
### Piping
Piping allows us to do multiple things to data without saving each thing
```{r}
airquality %>%
filter(Month != 5) %>%
mutate(TempInc = (Temp - 32) * 5 / 9)
```
### Group By
```{r}
airquality %>%
filter(Month != 5) %>%
group_by(Month) %>%
summarise(mean(Temp, na.rm = TRUE))
```