-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathR-Tutorial.html
572 lines (531 loc) · 55.6 KB
/
R-Tutorial.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
<!DOCTYPE html>
<html>
<head>
<title>R Tutorial</title>
<meta charset="utf-8">
<meta name="Description" content="R Language Tutorials for Advanced Statistics">
<meta name="Keywords" content="R, Tutorial, Machine learning, Statistics, Data Mining, Analytics, Data science, Linear Regression, Logistic Regression, Time series, Forecasting">
<meta name="Distribution" content="Global">
<meta name="Author" content="Selva Prabhakaran">
<meta name="Robots" content="index, follow">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="/screenshots/iconb-64.png" type="image/x-icon" />
<link href="www/bootstrap.min.css" rel="stylesheet">
<link href="www/highlight.css" rel="stylesheet">
<link href='http://fonts.googleapis.com/css?family=Inconsolata:400,700'
rel='stylesheet' type='text/css'>
<!-- Color Script -->
<style type="text/css">
a {
color: #3675C5;
color: rgb(25, 145, 248);
color: #4582ec;
color: #3F73D8;
}
li {
line-height: 1.65;
}
/* reduce spacing around math formula*/
.MathJax_Display {
margin: 0em 0em;
}
</style>
<!-- Add Google search -->
<script language="Javascript" type="text/javascript">
function my_search_google()
{
var query = document.getElementById("my-google-search").value;
window.open("http://google.com/search?q=" + query
+ "%20site:" + "http://r-statistics.co");
}
</script>
</head>
<body>
<div class="container">
<div class="masthead">
<!--
<ul class="nav nav-pills pull-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">
Table of contents<b class="caret"></b>
</a>
<ul class="dropdown-menu pull-right" role="menu">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</ul>
-->
<ul class="nav nav-pills pull-right">
<div class="input-group">
<form onsubmit="my_search_google()">
<input type="text" class="form-control" id="my-google-search" placeholder="Search..">
<form>
</div><!-- /input-group -->
</ul><!-- /.col-lg-6 -->
<h3 class="muted"><a href="/">r-statistics.co</a><small> by Selva Prabhakaran</small></h3>
<hr>
</div>
<div class="row">
<div class="col-xs-12 col-sm-3" id="nav">
<div class="well">
<li>
<ul class="list-unstyled">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</div>
<div class="well">
<p>Stay up-to-date. <a href="https://docs.google.com/forms/d/1xkMYkLNFU9U39Dd8S_2JC0p8B5t6_Yq6zUQjanQQJpY/viewform">Subscribe!</a></p>
<p><a href="https://docs.google.com/forms/d/13GrkCFcNa-TOIllQghsz2SIEbc-YqY9eJX02B19l5Ow/viewform">Chat!</a></p>
</div>
<h4>Contents</h4>
<ul class="list-unstyled" id="toc"></ul>
<!--
<hr>
<p><a href="/contribute.html">How to contribute</a></p>
<p><a class="btn btn-primary" href="">Edit this page</a></p>
-->
</div>
<div id="content" class="col-xs-12 col-sm-8 pull-right">
<blockquote>
<p>This is a beginner, non-programmer friendly guide to learn and understand the R language from scratch, giving a brief walk through of the most important parts of the language in plain English, intended to get you on board fairly quick.</p>
</blockquote>
<p>I assume that you have <a href="https://cran.r-project.org/">R</a> or <a href="https://rstudio.com/">RStudio installed</a> and are ready to follow along, typing the codes in to your R console as you learn.</p>
<h2>Section 1</h2>
<h4>First Touch: R As A Calculator</h4>
<p>Illustration of The Components of R Window</p>
<p><img src='screenshots/RStudio-window.png' width='528' height='310' /></p>
<p>If not for anything else, the R console can be used as a built-in calculator. Open your R console and type the following in. You need not type in anything after the <code>#</code> symbol in your console because the hash <code>#</code> is a comment character. R ignores everything that comes after the <code>#</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> +<span class="st"> </span><span class="dv">3</span> <span class="co"># the space around '+' is optional</span></code></pre></div>
<p>That prints out the number 5 as the answer in your console. Lets try some more.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="dv">2</span> *<span class="st"> </span><span class="dv">3</span> <span class="co">#=> 6</span>
<span class="kw">sqrt</span>(<span class="dv">36</span>) <span class="co">#=> 6, square root</span>
<span class="kw">log10</span>(<span class="dv">100</span>) <span class="co">#=> 2, log base 10</span>
<span class="dv">10</span> /<span class="st"> </span><span class="dv">3</span> <span class="co">#=> 3.3, 10 by 3</span>
<span class="dv">10</span> %/%<span class="st"> </span><span class="dv">3</span> <span class="co">#=> 3, quotient of 10 by 3</span>
<span class="dv">10</span> %%<span class="st"> </span><span class="dv">3</span> <span class="co">#=> 1, remainder of 10 by 3</span></code></pre></div>
<h4>The Assignment Operator</h4>
<p>The next thing you need to know about is R’s assignment operator. Unlike most other languages, R uses a <code><-</code> operator in addition to the usual <code>=</code> operator for assigning values. So whenever you see a <code><-</code> in R code, know that it just works like a <code>=</code> but in both directions. Here is an example for you to try out in your R console. Alternatively, you can use the R Editor to type in all 4 lines at once and press <code>Cmd+R</code>(on Mac) or <code>Ctrl+R</code>(on Windows) keys to run the selection or current line. In RStudio, use <code>Cmd+Enter</code> to <code>Ctrl+Enter</code> instead.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span><span class="dv">10</span> <span class="co"># assign 10 to 'a'</span>
a =<span class="st"> </span><span class="dv">10</span> <span class="co"># same as above</span>
<span class="dv">10</span> -><span class="st"> </span>a <span class="co"># assign 10 to 'a'</span>
<span class="dv">10</span> =<span class="st"> </span>a <span class="co"># Wrong!. This will try to assign `a` to 10.</span></code></pre></div>
<h4>Classes or Data types</h4>
<p>In previous code, you may have noticed that there is no dedicated step to define the type of variable. R intuitively decides that in the background and assigns a <code>class</code> to the variable.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">class</span>(a) <span class="co"># numeric</span></code></pre></div>
<p>Based on the value assigned to variable a, R decided to assign its class as a <code>numeric</code>. If you choose to change it as a character ’10’ instead of number 10, that can be done as follows:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span><span class="kw">as.character</span>(a)
<span class="kw">print</span>(a) <span class="co"># prints out the value of a</span>
<span class="kw">class</span>(a) <span class="co"># character</span></code></pre></div>
<p>Find out what happens when you try to convert a character to a numeric using <code>as.numeric()</code>. The next question naturally is what are the different types of classes available in R. The answer actually is infinite, since the users are free to define new classes, but here is some of most commonly used ones:</p>
<h4>Variable Types</h4>
<ul>
<li>character – Strings</li>
<li>integer – Integers</li>
<li>numeric – Integers + Fractions</li>
<li>factor – Categorical variable where each level is a category</li>
<li>logical – Boolean</li>
<li>complex – Complex numbers</li>
</ul>
<h4>Data Types</h4>
<ul>
<li>vector – A collection of elements of same class</li>
<li>matrix – All columns must uniformly contain only one variable type.</li>
<li>data.frame – The columns can contain different classes.</li>
<li>list – Can hold objects of different classes and lengths</li>
</ul>
<hr />
<h3>Some Miscellaneous but Important Items Before We Proceed . .</h3>
<h4>What is a R package and how to install them?</h4>
<p>Upon your first install, R comes with a built-in set of packages which can be invoked directly from your R console. However, since R is a open-source language, anyone can contribute to its capabilities by writing packages. Over the years, these contributions have resulted in growing list of over 5K packages. Here is how you can install the packages from within R console:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">install.packages</span>(<span class="st">"car"</span>) <span class="co"># install car package </span></code></pre></div>
<p>The above code will prompt you to select the CRAN mirror for which you can select one that is closest to your place. The dot(.) in <code>install.packages</code> is a part of its name and not two separate commands.</p>
<p>Now that the package is installed, you need to initialize it before you can call the functions and datasets that come with the installed package.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(car) <span class="co"># initialize the pkg 'car'</span>
<span class="kw">require</span>(car) <span class="co"># another way to initialize</span>
<span class="kw">library</span>() <span class="co"># see list of all installed packages</span>
<span class="kw">library</span>(<span class="dt">help=</span>car) <span class="co"># see info about 'car' pkg</span></code></pre></div>
<h4>Getting Help</h4>
<p>The easiest way to get help in R is using the <code>?</code> operator. Just append a <code>?</code> before the name of a function you want to get help, R will open find information about the function from the set of installed packages. If you want to search for it outside the installed packages, use <code>??</code> before the function name. <code>??</code> can also help search for partial and incomplete terms.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">help</span>(merge) <span class="co"># get help page for 'merge'</span>
?merge <span class="co"># lookup 'merge' from installed pkgs</span>
??merge <span class="co"># vague search</span>
<span class="kw">example</span>(merge) <span class="co"># show code examples</span></code></pre></div>
<h4>What is a working directory and how to set up one?</h4>
<p>A working directory is the reference directory from which R has direct access to read in files. You can read in and write files directly to the working directory without using the full file path. The directory names should be separated by forward slash <code>/</code> or double back slash <code>\\</code> instead of <code>\</code> even for a windows PC.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">getwd</span>() <span class="co"># gets the working directory</span>
<span class="kw">setwd</span>(dirname) <span class="co"># set the working directory to dir name</span></code></pre></div>
<h4>How to import and export data?</h4>
<p>The most common and convenient way to bring in data to R is through .csv files. There are packages to import data from excel files(.xlsx) and databases, but that will not be covered here.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myData <-<span class="st"> </span><span class="kw">read.table</span>(<span class="st">"c:/myInputData.txt"</span>, <span class="dt">header =</span> <span class="ot">FALSE</span>, <span class="dt">sep=</span><span class="st">"|"</span>, <span class="dt">colClasses=</span><span class="kw">c</span>(<span class="st">"integer"</span>,<span class="st">"character"</span>,<span class="st">"numeric"</span>) <span class="co"># import "|" separated .txt file</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myData <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"c:/myInputData.csv"</span>, <span class="dt">header=</span><span class="ot">FALSE</span>) <span class="co"># import csv file</span>
<span class="kw">write.csv</span>(rDataFrame, <span class="st">"c:/output.csv"</span>) <span class="co"># export </span></code></pre></div>
<p>R will intuitively find out what data type the columns in a data frame should be assigned. If you want to manually assign it, it can be set with the <code>colClasses</code> argument within read.csv(), which is in fact recommended as it improves the efficiency of the import process.</p>
<h4>How to view and delete objects in your console ?</h4>
<p>As you create new variables, by default they get store in what is called a global environment.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span><span class="dv">10</span>
b <-<span class="st"> </span><span class="dv">20</span>
<span class="kw">ls</span>() <span class="co"># list objects in global env</span>
<span class="kw">rm</span>(a) <span class="co"># delete the object 'a'</span>
<span class="kw">rm</span>(<span class="dt">list =</span> <span class="kw">ls</span>()) <span class="co"># caution: delete all objects in .GlobalEnv</span>
<span class="kw">gc</span>() <span class="co"># free system memory</span></code></pre></div>
<p>However if you choose, you can create a new environment and store them there.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rm</span>(<span class="dt">list=</span><span class="kw">ls</span>()) <span class="co"># remove all objects in work space</span>
env1 <-<span class="st"> </span><span class="kw">new.env</span>() <span class="co"># create a new environment</span>
<span class="kw">assign</span>(<span class="st">"a"</span>, <span class="dv">3</span>, <span class="dt">envir =</span> env1) <span class="co"># store a=3 inside env1</span>
<span class="kw">ls</span>() <span class="co"># returns objects in .GlobalEnv</span>
<span class="kw">ls</span>(env1) <span class="co"># returns objects in env1</span>
<span class="kw">get</span>(<span class="st">'a'</span>, <span class="dt">envir=</span>env1) <span class="co"># retrieve value from env1</span></code></pre></div>
<p>Lets talk about what happened in the above code. Think of an environment as a container that holds objects(variables). The outermost main container is called the global environment(<code>globalenv()</code>). This is the default place will R will store all objects that you create. You can place as many objects as your computer memory will allow it to hold. The point to note is that, since containers are also objects, you can put any number of containers(environments created by <code>new.env()</code>) inside the main container(<code>globalenv()</code>). But, you can look into and access the objects within these inner containers, only by explicitly telling R where you want to look. This is what you did in the last two lines of code above.</p>
<hr />
<h2>Section 2</h2>
<h3>Vectors</h3>
<h4>How to create a vector?</h4>
<p>Vectors can be created using the combine function <code>c()</code>. In order to create vector, you need to feed into <code>c()</code>, all the elements that you need to hold in that vector. Also, vectors can hold data of one type only – like character, numeric, logical. If you try to create a mixture of data types within a vector, say characters and numerics, one of the type will be converted to the other. Now Lets create some.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">vec1 <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">10</span>, <span class="dv">20</span>, <span class="dv">15</span>, <span class="dv">40</span>) <span class="co"># numeric vector</span>
vec2 <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>, <span class="ot">NA</span>) <span class="co"># character vector</span>
vec3 <-<span class="st"> </span><span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="ot">FALSE</span>, <span class="ot">TRUE</span>, <span class="ot">TRUE</span>) <span class="co"># logical vector</span>
vec4 <-<span class="st"> </span><span class="kw">gl</span>(<span class="dv">4</span>, <span class="dv">1</span>, <span class="dv">4</span>, <span class="dt">label =</span> <span class="kw">c</span>(<span class="st">"l1"</span>, <span class="st">"l2"</span>, <span class="st">"l3"</span>, <span class="st">"l4"</span>)) <span class="co"># factor with 4 levels</span></code></pre></div>
<p>There are two things you would have noticed. In the exercise 6.a, the class of ‘one_to_6’ was still <code>numeric</code> and not a new ‘vector’ class. Secondly, when you tried to create a mixed vector, all your numbers were automatically converted to characters to give you a ‘character’ class, which is expected as it makes sense to convert the number 1 to character “1”, but the opposite is not true for character “a”.</p>
<h4>How to reference elements of a vector?</h4>
<p>Elements of a vector can be accessed with its index. The first element of a vector has the index 1 and the last element has an index of value length(vectorName).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">length</span>(vec1) <span class="co"># 4</span>
<span class="kw">print</span>(vec1[<span class="dv">1</span>]) <span class="co"># 10</span>
<span class="kw">print</span>(vec1[<span class="dv">1</span>:<span class="dv">3</span>]) <span class="co"># 10, 20, 15</span></code></pre></div>
<p>At this point, I would like you to learn, how to initialize a vector to a certain length. But why initialize a vector when you can iteratively add(append) elements to it, especially in a language where you don’t even need to declare variables?</p>
<p>The reason is: It saves processing time. When you initialize a vector to hold, say 100 elements, that much space is almost instantly reserved for the vector in your computer’s memory. You can later fill in those spots by indexing the vector, like we just saw. It simply takes more processing time to iteratively append elements to your vector, especially when your vector gets really big.</p>
<p>Here is how to initialize a numeric vector:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">numericVector <-<span class="st"> </span><span class="kw">numeric</span>(<span class="dv">100</span>) <span class="co"># length 100 elements</span></code></pre></div>
<h3>How To Manipulate Vectors</h3>
<h4>Subsetting</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">logic1 <-<span class="st"> </span>vec1 <<span class="st"> </span><span class="dv">15</span> <span class="co"># create a logical vector, TRUE if value < 15</span>
vec1[logic1] <span class="co"># elements in TRUE positions will be included in subset</span>
vec1[<span class="dv">1</span>:<span class="dv">2</span>] <span class="co"># returns elements in 1 & 2 positions.</span>
vec1[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">3</span>)] <span class="co"># returns elements in 1 & 3 positions</span>
vec1[-<span class="dv">1</span>] <span class="co"># returns all elements except in position 1.</span></code></pre></div>
<h4>Sorting</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sort</span>(vec1) <span class="co"># ascending sort</span>
<span class="kw">sort</span>(vec1, <span class="dt">decreasing =</span> <span class="ot">TRUE</span>) <span class="co"># Descending sort </span></code></pre></div>
<p>Sorting can also be achieved using the order() function which returns the indices of elements in ascending order.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">vec1[<span class="kw">order</span>(vec1)] <span class="co"># ascending sort</span>
vec1[<span class="kw">rev</span>(<span class="kw">order</span>(vec1))] <span class="co"># descending sort</span></code></pre></div>
<h4>Creating vector sequences and repetitions</h4>
<p>The <code>seq()</code> and <code>rep()</code> functions are used to create custom vector sequences. The rep() function can be used to repeat alphabets also.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">seq</span>(<span class="dv">1</span>, <span class="dv">10</span>, <span class="dt">by =</span> <span class="dv">2</span>) <span class="co"># diff between adj elements is 2</span>
<span class="kw">seq</span>(<span class="dv">1</span>, <span class="dv">10</span>, <span class="dt">length=</span><span class="dv">25</span>) <span class="co"># length of the vector is 25</span>
<span class="kw">rep</span>(<span class="dv">1</span>, <span class="dv">5</span>) <span class="co"># repeat 1, five times.</span>
<span class="kw">rep</span>(<span class="dv">1</span>:<span class="dv">3</span>, <span class="dv">5</span>) <span class="co"># repeat 1:3, 5 times</span>
<span class="kw">rep</span>(<span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">each=</span><span class="dv">5</span>) <span class="co"># repeat 1 to 3, each 5 times.</span></code></pre></div>
<h4>How To Remove Missing values</h4>
<p>Missing values can be handles using the is.na() function which returns a logical vector with TRUE in positions where there is a missing value(NA)</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">vec2 <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>, <span class="ot">NA</span>) <span class="co"># character vector</span>
<span class="kw">is.na</span>(vec2) <span class="co"># missing TRUE</span>
!<span class="kw">is.na</span>(vec2) <span class="co"># missing FALSE</span>
vec2[!<span class="kw">is.na</span>(vec2)] <span class="co"># return non missing values from vec2</span></code></pre></div>
<h4>Sampling</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">100</span>) <span class="co"># optional. set it to get same random samples.</span>
<span class="kw">sample</span>(vec1) <span class="co"># sample all elements randomly</span>
<span class="kw">sample</span>(vec1, <span class="dv">3</span>) <span class="co"># sample 3 elem without replacement</span>
<span class="kw">sample</span>(vec1, <span class="dv">10</span>, <span class="dt">replace=</span>T) <span class="co"># sample with replacement</span></code></pre></div>
<hr />
<h2>Section 3</h2>
<h3>Data Frames</h3>
<h4>Creating Data frame and accessing rows and columns</h4>
<p>Data frames is a convenient and popular data object to perform various analyses. Import statements such as read.csv() imports data into R as a data frame, so its just convenient to keep it that way. Now lets create a data frame with the vectors we’d created earlier.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myDf1 <-<span class="st"> </span><span class="kw">data.frame</span>(vec1, vec2) <span class="co"># make data frame with 2 columns</span>
myDf2 <-<span class="st"> </span><span class="kw">data.frame</span>(vec1, vec3, vec4)
myDf3 <<span class="st"> </span><span class="kw">data.frame</span>(vec1, vec2, vec3)</code></pre></div>
<h4>Built-in Datasets and Basic Operations</h4>
<p>R comes with a set of built-in data frames. For further illustrations we will use the <code>airquality</code> data frame.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(datasets) <span class="co"># initialize</span>
<span class="kw">library</span>(<span class="dt">help=</span>datasets) <span class="co"># display the datasets</span></code></pre></div>
<p>The below set of codes will be frequently used if you are going to be playing around with data. So I highly recommend you to practice these once over and over to get a good handle over them.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">class</span>(airquality) <span class="co"># get class</span>
<span class="kw">sapply</span>(airquality, class) <span class="co"># get class of all columns</span>
<span class="kw">str</span>(airquality) <span class="co"># structure</span>
<span class="kw">summary</span>(airquality) <span class="co"># summary of airquality</span>
<span class="kw">head</span>(airquality) <span class="co"># view the first 6 obs</span>
<span class="kw">fix</span>(airquality) <span class="co"># view spreadsheet like grid</span>
<span class="kw">rownames</span>(airquality) <span class="co"># row names</span>
<span class="kw">colnames</span>(airquality) <span class="co"># columns names</span>
<span class="kw">nrow</span>(airquality) <span class="co"># number of rows</span>
<span class="kw">ncol</span>(airquality) <span class="co"># number of columns</span></code></pre></div>
<h4>Append data frames with cbind and rbind</h4>
<p>Lets append dataframes column wise <code>cbind</code> and row wise <code>rbind</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cbind</span>(myDf1, myDf2) <span class="co"># columns append DFs with same no. rows</span>
<span class="kw">rbind</span>(myDf1, myDf1) <span class="co"># row append DFs with same no. columns</span></code></pre></div>
<p>Subset Data frame with number indices, subset() and which() methods</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myDf1$vec1 <span class="co"># vec1 column</span>
myDf1[, <span class="dv">1</span>] <span class="co"># df[row.num, col.num]</span>
myDf1[, <span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>)] <span class="co"># columns 1 and 3</span>
myDf1[<span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">5</span>), <span class="kw">c</span>(<span class="dv">2</span>)] <span class="co"># first 5 rows in column 2</span></code></pre></div>
<p>Subsetting rows and columns can also be done using <code>subset()</code> and with <code>which()</code> functions. <code>which()</code> returns a vector of column or row indices that satisfies the condition. Let check this out with an example.</p>
<p>Below is a code that drops the <code>Temp</code> column from <code>airquality</code> data frame and returns only those observations with Day=1. Note that the <code>which()</code> is an independent function, therefore, the full object name must be used. Just <code>which(Day==1)</code> will not work, since there is no variable called <code>Day</code> defined.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">subset</span>(airquality, Day ==<span class="st"> </span><span class="dv">1</span>, <span class="dt">select =</span> -Temp) <span class="co"># select Day=1 and exclude 'Temp'</span>
airquality[<span class="kw">which</span>(airquality$Day==<span class="dv">1</span>), -<span class="kw">c</span>(<span class="dv">4</span>)] <span class="co"># same as above</span></code></pre></div>
<h4>Sampling</h4>
<p>Sampling your data into training(data on which models are built) and test(known data on which models are tested) is a common activity. Lets see how this can be done by creating a randomized 70:30 training and test sample from <code>airquality</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">100</span>)
trainIndex <-<span class="st"> </span><span class="kw">sample</span>(<span class="kw">c</span>(<span class="dv">1</span>:<span class="kw">nrow</span>(airquality)), <span class="dt">size=</span><span class="kw">nrow</span>(airquality)*<span class="fl">0.7</span>, <span class="dt">replace=</span>F) <span class="co"># get test sample indices</span>
airquality[trainIndex, ] <span class="co"># training data</span>
airquality[-trainIndex, ] <span class="co"># test data</span></code></pre></div>
<p>What was that! Here we create a 70% random sample of the row indices of the airquality data frame and use it to make the training and test samples. As you can see, the arguments that are passed to the sample() function are computed from within. For example, we needed a 70% sample, <code>size = nrow(airquality) * 0.7</code>, will compute 70% of the number of rows in airquality for the size argument. Moreover, we are actually defining the ‘sample’(1:nrow(airquality)) itself within the function only. Though this is a ‘unclean’ method of writing code, you get the taste of flexibility and control that the language has to offer.</p>
<h4>Merging Dataframes</h4>
<p>Data frames can be merged by a common column variable. The data frames need not be necessarily sorted before performing a merge. If the ‘by’ column has different names, they can be specified with the by.x and by.y. The inner / outer join, left join and right join can be accomplished with <code>all</code>, <code>all.x</code>, <code>all.y</code> arguments of <code>merge()</code>. Check out more on example(merge) in your R console.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">merge</span>(myDf1, myDf2, <span class="dt">by=</span><span class="st">"vec1"</span>) <span class="co"># merge by 'vec1'</span></code></pre></div>
<p>With the dataframes created from code below, perform the various merge operations.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">100</span>)
df1 =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">StudentId =</span> <span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">10</span>), <span class="dt">Subject =</span> <span class="kw">sample</span>(<span class="kw">c</span>(<span class="st">"Math"</span>, <span class="st">"Science"</span>, <span class="st">"Arts"</span>), <span class="dv">10</span>, <span class="dt">replace=</span>T))
df2 =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">StudentNum =</span> <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">4</span>, <span class="dv">6</span>, <span class="dv">12</span>), <span class="dt">Sport =</span> <span class="kw">sample</span>(<span class="kw">c</span>(<span class="st">"Football"</span>, <span class="st">"Tennis"</span>, <span class="st">"Chess"</span>), <span class="dv">4</span>, <span class="dt">replace=</span>T))</code></pre></div>
<hr />
<h2>Section 4</h2>
<h4>The <code>paste</code> function</h4>
<p><code>paste()</code> is a way to concatenate strings and customize with delimiters. With a clear understanding it comes handy to create long and complicated string patterns that can be dynamically modified. Try out these examples in your R console.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">paste</span>(<span class="st">"a"</span>, <span class="st">"b"</span>) <span class="co"># "a b"</span>
<span class="kw">paste0</span>(<span class="st">"a"</span>, <span class="st">"b"</span>) <span class="co"># concatenate without space, "ab"</span>
<span class="kw">paste</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="dt">sep=</span><span class="st">""</span>) <span class="co"># same as paste0</span>
<span class="kw">paste</span>(<span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">4</span>), <span class="kw">c</span>(<span class="dv">5</span>:<span class="dv">8</span>), <span class="dt">sep=</span><span class="st">""</span>) <span class="co"># "15" "26" "37" "48"</span>
<span class="kw">paste</span>(<span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">4</span>), <span class="kw">c</span>(<span class="dv">5</span>:<span class="dv">8</span>), <span class="dt">sep=</span><span class="st">""</span>, <span class="dt">collapse=</span><span class="st">""</span>) <span class="co"># "15263748"</span>
<span class="kw">paste0</span>(<span class="kw">c</span>(<span class="st">"var"</span>), <span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">5</span>)) <span class="co"># "var1" "var2" "var3" "var4" "var5"</span>
<span class="kw">paste0</span>(<span class="kw">c</span>(<span class="st">"var"</span>, <span class="st">"pred"</span>), <span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">3</span>)) <span class="co"># "var1" "pred2" "var3"</span>
<span class="kw">paste0</span>(<span class="kw">c</span>(<span class="st">"var"</span>, <span class="st">"pred"</span>), <span class="kw">rep</span>(<span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">each=</span><span class="dv">2</span>)) <span class="co"># "var1" "pred1" "var2" "pred2" "var3" "pred3</span></code></pre></div>
<h4>Dealing with dates</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dateString <-<span class="st"> "15/06/2014"</span>
myDate <-<span class="st"> </span><span class="kw">as.Date</span>(dateString, <span class="dt">format=</span><span class="st">"%d/%m/%Y"</span>)
<span class="kw">class</span>(myDate) <span class="co"># "Date"</span>
myPOSIXltDate %lt;-<span class="st"> </span><span class="kw">as.POSIXlt</span>(myDate)
<span class="kw">class</span>(myPOSIXltDate) <span class="co"># POSIXlt</span>
myPOSIXctDate <-<span class="st"> </span><span class="kw">as.POSIXct</span>(myPOSIXltDate) <span class="co"># convert to POSIXct</span></code></pre></div>
<h4>How to view contents of an R object?</h4>
<p>If you come across a new type of R object that you are unfamiliar with and want to see and access its contents, typically one or more of these methods will work. Lets take the example of the <code>POSIXlt</code> date object just created.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">attributes</span>(myPOSIXltDate) <span class="co"># best</span>
<span class="kw">unclass</span>(POSIXltDate) <span class="co"># works!</span>
<span class="kw">names</span>(myPOSIXltDate) <span class="co"># doesn't work on a POSIXlt object</span>
<span class="kw">unlist</span>(myPOSIXltDate) <span class="co"># works!</span></code></pre></div>
<p>As you can notice, the <code>POSIXlt</code> object we just dissected does not just contain the information displayed on the console when you type its name. It is a good idea to check the object size to know if it has more info that what meets the eye.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">object.size</span>(myDate) <span class="co"># 216 bytes</span>
<span class="kw">object.size</span>(myPOSIXltDate) <span class="co"># 1816 bytes</span>
<span class="kw">object.size</span>(myPOSIXctDate) <span class="co"># 520 bytes</span></code></pre></div>
<p>Now you know what level of information each of the classes provide and the memory usage. It is up to you to decide what to use based on your calculation requirements and data size.</p>
<h4>How To Make Contingency Tables</h4>
<p>Contingency tables gets you a count summary of a vector or 2 dimensional data. Let see how to get the count table for a vector.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">table</span>(myData)
<span class="co"># table output</span>
<span class="co">#=> 0 1 2 3 4 5 6 7 8 9 </span>
<span class="co">#=> 1 3 10 17 18 12 22 7 8 2</span></code></pre></div>
<p>Similarly, for a data frame, the variable that you want to appear in rows goes as the first argument of table() and the column variable goes as the second argument.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">table</span>(airquality$Month[<span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">60</span>)], airquality$Temp[<span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">60</span>)]) <span class="co"># first 60/code></span></code></pre></div>
<h4>List</h4>
<p>Lists are very important. If you need to bundle up objects of different lengths and classes, it can be achieved with lists.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myList <-<span class="st"> </span><span class="kw">list</span>(vec1, vec2, vec3, vec4)
<span class="co">#=> Output</span>
<span class="co">#=> [[1]]</span>
<span class="co">#=> [1] 10 20 15 40</span>
<span class="co">#=> [[2]]</span>
<span class="co">#=> [1] "a" "b" "c" NA</span>
<span class="co">#=> [[3]]</span>
<span class="co">#=> [1] TRUE FALSE TRUE TRUE</span>
<span class="co">#=> [[4]]</span>
<span class="co">#=> [1] l1 l2 l3 l4</span>
<span class="co">#=> Levels: l1 l2 l3 l4</span></code></pre></div>
<h4>Referencing lists</h4>
<p>Lists can have multiple levels within.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">mylist[<span class="dv">3</span>] <span class="co"># level 1</span>
<span class="co"># [[3]]</span>
<span class="co"># [1] TRUE FALSE TRUE TRUE</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myList[[<span class="dv">3</span>]] <span class="co"># level 2: access the vec3 directly</span>
<span class="co">#=> [1] TRUE FALSE TRUE TRUE</span>
myList[[<span class="dv">3</span>]][<span class="dv">3</span>] <span class="co"># 3rd elem of vec3</span>
<span class="co">#=> [1] TRUE</span>
<span class="kw">lapply</span>(myList, length) <span class="co"># length of each element as a list</span></code></pre></div>
<h4>Unlisting</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">unlist</span>() <span class="co"># flattens out into a one-level list.</span>
<span class="kw">unlist</span>(myList) <span class="co"># flattens out</span></code></pre></div>
<h4>If-Else</h4>
<p>One caveat about If-Else statements is to make sure the ‘else’ statement begins in the same line where the <code>}</code> closes. The structure of a If-Else statement is as follows:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">if(checkConditionIfTrue) {
....statements..
....statements..
} else { <span class="co"># place the 'else' in same line as '}'</span>
....statements..
....statements..
} </code></pre></div>
<h4>For-Loop</h4>
<p>Where ever possible it is recommended to use one of apply family functions for loops. However the knowledge is essential. Here is the format:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for(counterVar in <span class="kw">c</span>(<span class="dv">1</span>:n)){
.... statements..
}</code></pre></div>
<p>Problem statement: Create a character vector with length of number-of-rows-of-iris-dataset, such that, each element gets a value “greater than 5” if corresponding ‘Sepal.Length’ > 5, else it gets “lesser than 5”.</p>
<h4>The apply family</h4>
<p><strong>apply():</strong> Apply FUN through a data frame or matrix by rows or columns.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">myData <-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">seq</span>(<span class="dv">1</span>,<span class="dv">16</span>), <span class="dv">4</span>, <span class="dv">4</span>) <span class="co"># make a matrix</span>
<span class="kw">apply</span>(myData, <span class="dv">1</span>, <span class="dt">FUN=</span>min) <span class="co"># apply 'min' by rows</span>
<span class="co">#=> [1] 1 2 3 4</span>
<span class="kw">apply</span>(myData, <span class="dv">2</span>, <span class="dt">FUN=</span>min) <span class="co"># apply 'min' by columns</span>
<span class="co">#=> [1] 4 8 12 16</span>
<span class="kw">apply</span>(<span class="kw">data.frame</span>(<span class="dv">1</span>:<span class="dv">5</span>), <span class="dv">1</span>, <span class="dt">FUN=</span>function(x) {x^<span class="dv">2</span>} <span class="co"># square of 1,2,3,4,5</span>
<span class="co">#=> [1] 1 4 9 16 25</span></code></pre></div>
<p><strong>lapply():</strong> Apply FUN to each element in a list(or) to columns of a data frame and return the result as a list</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">lapply</span>(airquality, class) <span class="co"># return classes of each column in 'airquality' in a list</span></code></pre></div>
<p><strong>sapply():</strong> Apply FUN to each element of a list(or) to columns of a data frame and return the result as a vector.</p>
<p>Lets look at an example to get the class of each column in a data frame.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sapply</span>(airquality, class) <span class="co"># return classes of each column in 'airquality'</span>
<span class="co">#=> Ozone Solar.R Wind Temp Month Day</span>
<span class="co">#=> "integer" "integer" "numeric" "integer" "integer" "integer"</span></code></pre></div>
<p>vapply(): Similar to <code>sapply()</code> but faster. You need to supply an additional <code>FUN.VALUE</code> argument that is a sample value of the returned output. A sample value could be <code>character(0)</code> for a string, <code>numeric(0)</code> or <code>0L</code> for a number, <code>logical(0)</code> for a boolean.. and so on.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">list</span>(<span class="dt">a =</span> <span class="dv">1</span>, <span class="dt">b =</span> <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">c =</span> <span class="dv">10</span>:<span class="dv">100</span>) <span class="co"># make a list</span>
<span class="kw">vapply</span>(x, <span class="dt">FUN =</span> length, <span class="dt">FUN.VALUE =</span> 0L) <span class="co"># FUN.VALUE defines a sample format of output</span></code></pre></div>
<h4>Error Handling</h4>
<p>There are ways to graciously handle error messages in R. The first and the most simple way is to tell R not to display any error messages, no matter how brutal it is. Try the following code in your R console, you will notice that you R does not display error messages right after turn error messages OFF. You can turn it back ON by setting this to <code>TRUE</code> again.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">options</span>(<span class="dt">show.error.messages=</span>F) <span class="co"># turn off</span>
<span class="dv">1</span> <-<span class="dv">1</span> <span class="co">#=> No error message is displayed.</span>
<span class="kw">options</span>(<span class="dt">show.error.messages=</span>T) <span class="co"># turn it back on</span>
<span class="dv">1</span> <-<span class="st"> </span><span class="dv">1</span>
<span class="co">#=> Error in 1 <- 1 : invalid(do_set) left-hand side to assignment</span></code></pre></div>
<p>Though you have turned off displaying error messages above, you have not actually ‘handled’ it. You can say the error messages are ‘handled’ when you are able to perform some alternative measures in the event errors happen. In the code below, we have a simple for-loop iterating 10 times, where the counter ‘i’ takes the values 1 – 10. You are going to intentially trigger an error and see what value the counter i holds at the end of the loop. If the loop had run in full successfully, i should hold the value 10.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for(i in <span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">10</span>)) {
<span class="dv">1</span> <-<span class="st"> </span><span class="dv">1</span> <span class="co"># trigger the error</span>
}
<span class="kw">print</span>(i) <span class="co"># i equals 1. Never ran through full loop</span></code></pre></div>
<p>Without the error handling feature, the loop is broken as soon as an error is encountered and the rest of the iterations are abruptly stopped. However, there are scenarios where you will want the loop to continue even if an error is encountered. This can be easily done by passing the error-prone function into a <code>try()</code> function. In this case, the loop continues to iterate even after it encounters an error.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for(i in <span class="kw">c</span>(<span class="dv">1</span>:<span class="dv">10</span>)) {
triedOut <-<span class="st"> </span><span class="kw">try</span>(<span class="dv">1</span> <-<span class="st"> </span><span class="dv">1</span>) <span class="co"># try an error prone statement.</span>
}
<span class="kw">print</span>(i) <span class="co"># i equals 10. Runs through full loop</span></code></pre></div>
<p>Further more, you can find out if an error did really occur by checking for the class of stored <code>triedOut</code> variable. If an error really did occur, it will have the class named <code>try-error</code>. You can get creative by having a condition that checks the class of this variable, and take alternative measures.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">class</span>(triedOut) <span class="co"># "try-error"</span></code></pre></div>
<p>You can even pass multiple lines of code within <code>try()</code> by enclosing them in a pair of curly braces <code>{}</code>. We are almost set with error handling. But your knowledge of handling errors is not complete without knowing about <code>tryCatch()</code>. <code>tryCatch()</code> lets you handle errors in a more structured fashion, encompassing the actual error-handling part(as a ‘error’ function) in one of its argument. Time to introduce the <code>tryCatch()</code>.</p>
<h4>Error handling with tryCatch()</h4>
<p>The <code>trycatch()</code> function has three blocks written within curly braces as seen in code below. The first curly brace block takes in the statements, just like the <code>try()</code> function we saw earlier. Like <code>try()</code>, it can now take multiple lines of code within the 1st set of curly brackets.</p>
<p>If an error was encountered in ANY of the statements in the first block, then the error message generated will be stored in a <code>err</code> variable(see code below) that the error handling function(called ‘error’) uses. You can choose to print out this error message, do some alternative calculation or whatever you want. You can also even perform a completely different set of logics within this function that doesn’t involve the error message. Its really upto you. The last set of curly braces called <code>finally</code> is meant to be executed regardless of whether an error did or did not occur. You may choose to ignore adding any statements to this part altogether.</p>
<p>Here is an example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">tryCatch</span>({<span class="dv">1</span> <-<span class="st"> </span><span class="dv">1</span>; <span class="kw">print</span>(<span class="st">"Lets create an error"</span>)}, <span class="co"># First block</span>
<span class="dt">error=</span>function(err){<span class="kw">print</span>(err); <span class="kw">print</span>(<span class="st">"Error Line"</span>)}, <span class="co"># Second Block(optional)</span>
<span class="dt">finally =</span> {<span class="kw">print</span>(<span class="st">"finally print this"</span>)})<span class="co"># Third Block(optional)</span>
<span class="co">#=> [1] "Lets create an error"</span>
<span class="co">#=> <simpleError in 1 <- 1: invalid(do_set) left-hand side to assignment></span>
<span class="co">#=> [1] "Error Line"</span>
<span class="co">#=> [1] "finally print this"</span></code></pre></div>
<h4>Final Note</h4>
<p>You will come across many more cool R functions in future that can do really cool stuff. The number of functions and facilities that R has to offer will keep growing. You will have the access and opportunity to learn them on a needs-to-know basis from here on. In that sense, the learning will never be complete. However, in this exercise, we have covered the important ones you need to worry about at this stage. So begin writing code with renewed confidence. Happy Learning!</p>
<hr />
</div>
</div>
<div class="footer">
<hr>
<p>© 2016-17 Selva Prabhakaran. Powered by <a href="http://jekyllrb.com/">jekyll</a>,
<a href="http://yihui.name/knitr/">knitr</a>, and
<a href="http://johnmacfarlane.net/pandoc/">pandoc</a>.
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">Creative Commons License.</a>
</p>
</div>
</div> <!-- /container -->
<script src="//code.jquery.com/jquery.js"></script>
<script src="www/bootstrap.min.js"></script>
<script src="www/toc.js"></script>
<!-- MathJax Script -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript"
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<!-- Google Analytics Code -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-69351797-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
/* reduce spacing around math formula*/
.MathJax_Display {
margin: 0em 0em;
}
body {
font-family: 'Helvetica Neue', Roboto, Arial, sans-serif;
font-size: 16px;
line-height: 27px;
font-weight: 400;
}
blockquote p {
line-height: 1.75;
color: #717171;
}
.well li{
line-height: 28px;
}
li.dropdown-header {
display: block;
padding: 0px;
font-size: 14px;
}
</style>
</body>
</html>