-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathIntro_R_Programming_Text.Rmd
More file actions
385 lines (271 loc) · 7.49 KB
/
Intro_R_Programming_Text.Rmd
File metadata and controls
385 lines (271 loc) · 7.49 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
---
title: "Introduction to the R Programming Language"
output: html_document
---
# The R Programming Language
R is a free software environment for statistical computing and graphics. It is widely used for:
- Data analysis
- Statistical modeling
- Simulation and visualization
- Scientific computing
---
# Vector Types in R
R operates on named data structures. The most basic is the **vector**, which is an ordered collection of values.
There are several types of vectors:
- Numeric vectors
- Character vectors
- Logical vectors
- (Others include complex vectors and color vectors)
### Creating Vectors
```r
# Numeric vector
NumVec <- c(10.4, 5.6, 3.1, 6.4)
# Character vector
CharVec <- c("blue", "green", "yellow")
# Logical vector
LogVec <- c(TRUE, FALSE)
# Using assign()
assign("z", c(1, 2, 3))
```
---
# Set Theory Operations
```r
a <- c(1, 2, 3, 4)
b <- c(3, 4, 5, 6)
union(a, b)
intersect(a, b)
setdiff(a, b)
```
---
# Controlling Precision and Integerization
```r
pi
round(pi, 3)
round(pi, 2)
floor(pi)
ceiling(pi)
as.integer(pi)
```
---
# Important Introductory Functions
```r
head(mtcars) # First few rows
tail(iris) # Last few rows
```
---
# Writing Functions in R
### General Syntax
```r
function_name <- function(arg1, arg2 = default, ...) {
# Commands
result <- ...
return(result)
}
```
### Example 1: Square Function
```r
sf1 <- function(x) {
x^2
}
sf1(3) # returns 9
x.sq <- sf1(3) # store result
```
### Example 2: Pythagorean Distance
```r
sf2 <- function(a1, a2, a3) {
x <- sqrt(a1^2 + a2^2 + a3^2)
return(x)
}
res <- sf2(2, 3, 4)
res
```
### Example 3: Power Function with Defaults
```r
mypower <- function(x, pow = 2) {
x^pow
}
mypower(4) # returns 16
mypower(4, 3) # returns 64
mypower(pow = 5, x = 2) # returns 32
```
---
# Statistical Tests and ANOVA Interpretation
R provides tools for linear regression and ANOVA output.
### Test for the Slope
Given:
```r
beta_hat <- 2.4
se_beta <- 0.3
t_stat <- beta_hat / se_beta
t_stat
```
### Residual Checking
Use diagnostic plots to validate model assumptions:
```r
model <- lm(mpg ~ wt, data = mtcars)
par(mfrow = c(2, 2))
plot(model)
```
---
# Confidence Intervals Example
```r
xbar <- 83
sigma <- 12
n <- 5
sem <- sigma / sqrt(n)
lower <- xbar + sem * qnorm(0.025)
upper <- xbar + sem * qnorm(0.975)
c(lower, upper)
```
---
# Chi-Square Test
```r
table_data <- matrix(c(12, 5, 7, 16), nrow = 2)
chisq.test(table_data)
```
---
# Creating and Importing Data
### From Excel
```r
library(gdata)
mydata <- read.xls("mydata.xls")
```
### From CSV
```r
mycsv <- read.csv("data.csv")
write.csv(mycsv, "output.csv")
```
---
# Data Entry with `scan()`
```r
nums <- scan() # input: 1 2 3 4 ENTER
```
---
# Classes of Data Objects
```r
class(NumVec)
class(CharVec)
class(mtcars)
```
---
Let me know if you’d like this saved as an .Rmd file, or if you'd like to split it up into multiple lessons or chapters!
```
---
\newpage
\chapter{Programming}
\section{Writing Functions}
A simple function can be constructed as follows:
\begin{verbatim}
function_name <- function(arg1, arg2, ...){
commands
output
}
\end{verbatim}
You decide on the name of the function. The function command shows R that you are writing a function. Inside the parenthesis you outline the input objects required and decide what to call them. The commands occur inside the { }.
The name of whatever output you want goes at the end of the function. Comments lines (usually a description of what the function does is placed at the beginning) are denoted by "\#".
\begin{verbatim}sf1 <- function(x){
x^2
}
\end{verbatim}
This function is called sf1. It has one argument, called x.
Whatever value is inputted for x will be squared and the result outputted to the screen. This function must be loaded into ``R} and can then be called. We can call the function using:
\begin{verbatim}
sf1(x = 3)
#sf1(3)
[1] 9
To store the result into a variable x.sq
x.sq <- sf1(x = 3)
x.sq <- sf1(3)
> x.sq
[1] 9
\end{verbatim}
Example
\begin{verbatim}
sf2 <- function(a1, a2, a3){
x <- sqrt(a1^2 + a2^2 + a3^2)
return(x)
}
\end{verbatim}
This function is called sf2 with 3 arguments. The values inputted for a1, a2, a3 will be squared, summed and the square root of the sum calculated and stored in x. (There will be no output to the screen as in the last example.)
The return command specifies what the function returns, here the value of x. We will not be able to view the result of the function unless we store it.
\begin{verbatim}sf2(a1=2, a2=3, a3=4)
sf2(2, 3, 4) # Can't see result.
res <- sf2(a1=2, a2=3, a3=4)
res <- sf2(2, 3, 4) # Need to use this.
res
[1] 5.385165
\end{verbatim}
We can also give some/all arguments default values.
\begin{verbatim}mypower <- function(x, pow=2){
x^pow
}
\end{verbatim}
If a value for the argument pow is not specified in the function call,
a value of 2 is used.
\begin{verbatim}mypower(4)
[1] 16
\end{verbatim}
If a value for "pow" is specified, that value is used.
\begin{verbatim}
mypower(4, 3)
[1] 64
mypower(pow=5, x=2)
[1] 32
\end{verbatim}
%----------------------------------------------------%
\large \begin{verbatim}
> code here
\end{verbatim}\large
\large \begin{verbatim}
> code here
\end{verbatim}\large
%---------------------------------------------------%
\subsubsection{slide234}
The TS are <equation here>
The p-values for both of these tests are 0 and so there is enough evidence to reject $H_0$ and conclude that both 0 and 1 are not 0, i.e. there is a significant linear relationship between x and y.
Also given are the $R^2$ and $R^2$ adjusted values. Here $R^2 = SSR/SST = 0.8813$ and so $88.13\%$ of the variation in y is being explained by x.
The final line gives the result of using the ANOVA table to assess the model t.
%----------------------------------------------------%
\subsubsection{slide235}
In SLR, the ANOVA table tests <EQN>The TS is the F value and the critical value and p-values are found
in the F tables with (p - 1) and (n - p) degrees of freedom.
This output gives the p-value = 0, therefore there is enough evidence to reject H0 and conclude that there is a signicant linear relationship between y and x. The full ANOVA table can be accessed using :
<TABLE HERE>
\subsubsection{slide236}
Once the model has been tted, must then check the residuals.
The residuals should be independent and normally distributed with
mean of 0 and constant variance.
A Q-Q plot checks the assumption of normality (can also use a
histogram as in MINITAB) while a, plot of the residuals versus fitted values gives an indication as to whether the assumption of constant variance holds.
<HISTOGRAM>
%----------------------------------------------------%
\subsubsection{slidename}
\large \begin{verbatim}
> xbar <- 83
> sigma <- 12
> n <- 5
> sem <- sigma/sqrt(n)
> sem
[1] 5.366563
> xbar + sem * qnorm(0.025)
[1] 72.48173
> xbar + sem * qnorm(0.975)
[1] 93.51827
\end{verbatim}\large
\subsubsection{Testing the slope (II)}
You can compute a
t test for that hypothesis simply by dividing the estimate by its standard
error
\begin{equation}
t = \frac{\hat{\beta}}{S.E.(\hat{\beta})}
\end{equation}
which follows a t distribution on n - 2 degrees of freedom if the true $\beta$ is
zero.
%----------------------------------------------------%
\begin{itemize}
* The standard $\chi^{2}$ test in chisq.test works with data in matrix form, like fisher.test does.
* For a 2 by 2 table, the test is exactly equivalent to prop.test.
\end{itemize}
\large \begin{verbatim}
> chisq.test(lewitt.machin)
\end{verbatim}\large