Some tips and tricks for working with date and time data #33
ernestguevarra
started this conversation in
Hackathon 2024
Replies: 2 comments
-
Using dependencies to deal with date data typesThis is where package dependencies, in my opinion, give a lot more utility for dealing with date data types. Here is example code to deal with datasets with date data types using package dependencies: library(dplyr) ## for data wrangling
library(tidyr) ## for data wrangling adjunct to dplyr
library(lubridate) ## for workign with date data type
library(ggplot2) ## for plotting
malaria <- read.table("https://raw.githubusercontent.com/OxfordIHTM/teaching_datasets/main/malaria.dat", header = TRUE)
malaria <- malaria %>%
dplyr::mutate(Time = my(Time)) %>%
tidyr::pivot_longer(Cases:Rain, names_to = "variable", values_to = "n")
malaria %>%
dplyr::filter(variable == "Cases") %>%
ggplot(mapping = aes(x = Time, y = n, group = variable)) +
geom_line() +
scale_x_date(
breaks = seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month"),
labels = paste(
seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month") %>%
lubridate::month(label = TRUE) %>%
as.character(),
seq(from = min(malaria$Time), to = max(malaria$Time), by = "2 month") %>%
lubridate::year() %>%
as.character()
)
) +
scale_y_continuous(
breaks = seq(
from = 0, to = max(malaria$n[malaria$variable == "Cases"]), by = 100
)
) +
labs(
title = "Malaria cases over time",
subtitle = "July 1997 to July 1999",
x = NULL, y = "n"
) +
theme_bw() +
theme(axis.text.x.bottom = element_text(angle = 90, vjust = 0.5, hjust = 1))This gives the following plot: |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thanks Ernest. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Using an example real world data on malaria containing data on rainfall (in mm) and the number of cases of malaria reported from health centres in an administrative district of Ethiopia between July 1997 and July 1999.
The dataset is available from the Oxford IHTM teaching datasets repository and can be read into R as follows:
The dataset looks like this (all records):
For this guide, we will focus on how we work with the
dateinformation found in theTimevariable in the dataset.In this dataset, the date information is recorded in
characterclass. You can check this by using theclass()function:As a general principle, we would like to keep date data in
dateclass because this is the format in which R recognises how to handle this data appropriately. There are many implications of this but in this guide, we will show first what the implication of not having date information indateclass when it comes to plotting.The malaria dataset is a time series dataset and the most basic analysis we can perform on this dataset is to create a time series plot to show trend of cases and/or trend of rainfall over time (per month). If we create this plot using base R plotting functions without processing the
Timevariable (keeping it as is), we get the following:we get the following error:
This error indicates that R doesn't know how to plot/deal with the
Timevariable as it is hence it is complaining about the x values.So, we need to sort of process that
Timevariable so that we can use it for plotting. The characteristics of what theTimevariable should be is that it should be able to be recognised by R as data point that has a chronological order (i.e., months go from January to December and then years go from lowest year to highest year, and if the data has days, then lowest day to highest day). The data type that can be used that would give theTimevariable these characteristics is theDateclass. To read more about this data type, issue?as.Dateon your R console to read the help file.So, we will now transform the
Timevariable into a date rather than a character class as follows:and checking the data type of
Timevariable, we get:now, this should address the issue with the plotting. So, we try plotting again and we get:
This plot looks a lot more like what we expect.
Beta Was this translation helpful? Give feedback.
All reactions