@@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
37
37
38
38
::::::::::::::::::::::::::::::::::::::::: instructor
39
39
40
- Pay attention to and explain the errors and warnings generated from the
40
+ Pay attention to and explain the errors and warnings generated from the
41
41
examples in this episode.
42
42
43
- :::::::::::::::::::::::::::::::::::::::::
43
+ :::::::::::::::::::::::::::::::::::::::::
44
44
45
45
``` {r, echo=TRUE}
46
46
gapminder <- read.csv("data/gapminder_data.csv")
@@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
75
75
76
76
- You can read directly from excel spreadsheets without
77
77
converting them to plain text first by using the [ readxl] ( https://cran.r-project.org/package=readxl ) package.
78
-
78
+
79
79
80
80
::::::::::::::::::::::::::::::::::::::::::::::::::
81
81
@@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`:
86
86
str(gapminder)
87
87
```
88
88
89
- We can also examine individual columns of the data frame with our ` class ` function:
89
+ We can also examine individual columns of the data frame with the ` class ` or
90
+ 'typeof' functions.:
90
91
91
92
``` {r}
92
93
class(gapminder$year)
94
+ typeof(gapminder$year)
93
95
class(gapminder$country)
94
96
str(gapminder$country)
95
97
```
@@ -281,6 +283,59 @@ tail(gapminder_norway)
281
283
282
284
To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
283
285
286
+
287
+ ## Removing columns and rows in data frames
288
+
289
+ To remove columns from a data frame, we can use the 'subset' function.
290
+ This function allows us to remove columns using their names:
291
+
292
+ ``` {r}
293
+ life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
294
+ head(life_expectancy)
295
+ ```
296
+
297
+ We can also use a logical vector to achieve the same result. Make sure the
298
+ vector's length match the number of columns in the data frame (to avoid vector
299
+ recycling):
300
+
301
+ ``` {r}
302
+ life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
303
+ head(life_expectancy)
304
+ ```
305
+
306
+ Alternatively, we can use column's positions:
307
+
308
+ ``` {r}
309
+ life_expectancy <- gapminder[-c(3, 4, 6)]
310
+ head(life_expectancy)
311
+ ```
312
+
313
+ Note that the easy way to remove rows from a data frame is selecting the rows
314
+ we want to keep instead.
315
+ Anyway, to remove rows from a data frame, we can use their positions:
316
+
317
+ ``` {r}
318
+ # Filter data for Afghanistan during the 20th century:
319
+ afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
320
+ gapminder$year > 2000, ]
321
+
322
+ # Now remove data for 2002, that is, the first row:
323
+ afghanistan_20c[-1, ]
324
+ ```
325
+
326
+
327
+ An interesting case is removing rows containing NAs:
328
+
329
+ ``` {r}
330
+ # Turn some values into NAs:
331
+ afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
332
+ afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
333
+
334
+ # Remove NAs
335
+ na.omit(afghanistan_20c)
336
+ ```
337
+
338
+
284
339
## Factors
285
340
286
341
Here is another thing to look out for: in a ` factor ` , each different value
0 commit comments