Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
albhasan committed Nov 19, 2023
1 parent 28c21ed commit cf78c38
Showing 1 changed file with 59 additions and 4 deletions.
63 changes: 59 additions & 4 deletions episodes/04-data-structures-part2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat

::::::::::::::::::::::::::::::::::::::::: instructor

Pay attention to and explain the errors and warnings generated from the
Pay attention to and explain the errors and warnings generated from the
examples in this episode.

:::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::

```{r, echo=TRUE}
gapminder <- read.csv("data/gapminder_data.csv")
Expand Down Expand Up @@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind

- You can read directly from excel spreadsheets without
converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.


::::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`:
str(gapminder)
```

We can also examine individual columns of the data frame with our `class` function:
We can also examine individual columns of the data frame with the `class` or
'typeof' functions.:

```{r}
class(gapminder$year)
typeof(gapminder$year)
class(gapminder$country)
str(gapminder$country)
```
Expand Down Expand Up @@ -281,6 +283,59 @@ tail(gapminder_norway)

To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.


## Removing columns and rows in data frames

To remove columns from a data frame, we can use the 'subset' function.
This function allows us to remove columns using their names:

```{r}
life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
head(life_expectancy)
```

We can also use a logical vector to achieve the same result. Make sure the
vector's length match the number of columns in the data frame (to avoid vector
recycling):

```{r}
life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
head(life_expectancy)
```

Alternatively, we can use column's positions:

```{r}
life_expectancy <- gapminder[-c(3, 4, 6)]
head(life_expectancy)
```

Note that the easy way to remove rows from a data frame is selecting the rows
we want to keep instead.
Anyway, to remove rows from a data frame, we can use their positions:

```{r}
# Filter data for Afghanistan during the 20th century:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
gapminder$year > 2000, ]
# Now remove data for 2002, that is, the first row:
afghanistan_20c[-1, ]
```


An interesting case is removing rows containing NAs:

```{r}
# Turn some values into NAs:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
# Remove NAs
na.omit(afghanistan_20c)
```


## Factors

Here is another thing to look out for: in a `factor`, each different value
Expand Down

0 comments on commit cf78c38

Please sign in to comment.