Skip to content

Commit cf78c38

Browse files
committed
1 parent 28c21ed commit cf78c38

File tree

1 file changed

+59
-4
lines changed

1 file changed

+59
-4
lines changed

episodes/04-data-structures-part2.Rmd

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
3737

3838
::::::::::::::::::::::::::::::::::::::::: instructor
3939

40-
Pay attention to and explain the errors and warnings generated from the
40+
Pay attention to and explain the errors and warnings generated from the
4141
examples in this episode.
4242

43-
:::::::::::::::::::::::::::::::::::::::::
43+
:::::::::::::::::::::::::::::::::::::::::
4444

4545
```{r, echo=TRUE}
4646
gapminder <- read.csv("data/gapminder_data.csv")
@@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
7575

7676
- You can read directly from excel spreadsheets without
7777
converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.
78-
78+
7979

8080
::::::::::::::::::::::::::::::::::::::::::::::::::
8181

@@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`:
8686
str(gapminder)
8787
```
8888

89-
We can also examine individual columns of the data frame with our `class` function:
89+
We can also examine individual columns of the data frame with the `class` or
90+
'typeof' functions.:
9091

9192
```{r}
9293
class(gapminder$year)
94+
typeof(gapminder$year)
9395
class(gapminder$country)
9496
str(gapminder$country)
9597
```
@@ -281,6 +283,59 @@ tail(gapminder_norway)
281283

282284
To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
283285

286+
287+
## Removing columns and rows in data frames
288+
289+
To remove columns from a data frame, we can use the 'subset' function.
290+
This function allows us to remove columns using their names:
291+
292+
```{r}
293+
life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
294+
head(life_expectancy)
295+
```
296+
297+
We can also use a logical vector to achieve the same result. Make sure the
298+
vector's length match the number of columns in the data frame (to avoid vector
299+
recycling):
300+
301+
```{r}
302+
life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
303+
head(life_expectancy)
304+
```
305+
306+
Alternatively, we can use column's positions:
307+
308+
```{r}
309+
life_expectancy <- gapminder[-c(3, 4, 6)]
310+
head(life_expectancy)
311+
```
312+
313+
Note that the easy way to remove rows from a data frame is selecting the rows
314+
we want to keep instead.
315+
Anyway, to remove rows from a data frame, we can use their positions:
316+
317+
```{r}
318+
# Filter data for Afghanistan during the 20th century:
319+
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
320+
gapminder$year > 2000, ]
321+
322+
# Now remove data for 2002, that is, the first row:
323+
afghanistan_20c[-1, ]
324+
```
325+
326+
327+
An interesting case is removing rows containing NAs:
328+
329+
```{r}
330+
# Turn some values into NAs:
331+
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
332+
afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
333+
334+
# Remove NAs
335+
na.omit(afghanistan_20c)
336+
```
337+
338+
284339
## Factors
285340

286341
Here is another thing to look out for: in a `factor`, each different value

0 commit comments

Comments
 (0)