-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.Rmd
211 lines (158 loc) · 7.22 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
fig.height=5, fig.width=8,
message=FALSE, warning=FALSE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sfo <a href='https://ramikrispin.github.io/sfo/'><img src='man/figures/sfo.png' align="right" width="150" height="150" /></a>
```{r, include=FALSE}
library(plotly)
library(dplyr)
library(sfo)
p <- sfo_passengers %>%
group_by(activity_period) %>%
summarise(total = sum(passenger_count), .groups = "drop") %>%
mutate(date = as.Date(paste(substr(activity_period, 1, 4),
substr(activity_period, 5, 6), "01", sep = "/"))) %>%
plot_ly(x = ~ date, y = ~ total,
type = "scatter",
mode = "lines") %>%
layout(title = "Monthly Air Traffic Passengers in SFO",
yaxis = list(title = "Number of Passengers"),
xaxis = list(title = "Source: San Francisco data portal (DataSF)"))
orca(p, "man/figures/total.svg")
```
<!-- badges: start -->
[](https://cran.r-project.org/package=sfo) [](https://lifecycle.r-lib.org/articles/stages.html#stable) [](https://opensource.org/license/mit/) [](https://github.com/RamiKrispin/sfo/commit/main)
<!-- badges: end -->
The **sfo** package summarizes the monthly air passengers and landings at San Francisco International Airport (SFO) between 2005 and 2022.
Data source: San Francisco data portal - [DataSF API](https://datasf.org/opendata/)
<img src="man/figures/total.svg" width="90%"/>
## Installation
Install the stable version from CRAN:
``` r
install.packages("sfo")
```
or install the development version from Github:
``` r
# install.packages("devtools")
devtools::install_github("RamiKrispin/sfo", ref = "main")
```
### Datasets
The **sfo** package provides the following two datasets:
* `sfo_passengers` - air traffic passengers statistics
* `sfo_stats` - air traffic landings statistics
More information about the datasets is available in the following [vignette](https://ramikrispin.github.io/sfo/articles/v1_intro.html).
### Examples
The `sfo_passengers` dataset provides monthly summary of the number of passengers in SFO airport by different categories (such as terminal, geo, type, etc.):
```{r }
library(sfo)
data("sfo_passengers")
head(sfo_passengers)
```
The `sfo_stats` dataset provides monthly statistics on the air traffic landing at SFO airport:
```{r }
data("sfo_stats")
head(sfo_stats)
```
#### Total number of passngers
The total number of passengers in most recent month by `activity_type_code` and `geo_region`:
```{r }
library(dplyr)
sfo_passengers %>%
filter(activity_period == max(activity_period)) %>%
group_by(activity_type_code, geo_region) %>%
summarise(total = sum(passenger_count), .groups = "drop")
```
The `sankey_ly` function enables us to plot the distribution of a numeric variable by multiple categorical variables. The following example shows the distribution of the total United Airlines passengers during 2019 by a terminal, travel type (domestic and international), geo, and travel direction (deplaned, enplaned, and transit):
``` r
sfo_passengers %>%
filter(operating_airline == "United Airlines",
activity_period >= 201901 & activity_period < 202001) %>%
mutate(terminal = ifelse(terminal == "International", "international", terminal)) %>%
group_by(operating_airline,activity_type_code, geo_summary, geo_region, terminal) %>%
summarise(total = sum(passenger_count), .groups = "drop") %>%
sankey_ly(cat_cols = c("operating_airline", "terminal","geo_summary", "geo_region", "activity_type_code"),
num_col = "total",
title = "Dist. of United Airlines Passengers at SFO During 2019")
```
```{r, include=FALSE}
p <- sfo_passengers %>%
filter(operating_airline == "United Airlines",
activity_period >= 201901 & activity_period < 202001) %>%
mutate(terminal = ifelse(terminal == "International", "international", terminal)) %>%
group_by(operating_airline,activity_type_code, geo_summary, geo_region, terminal) %>%
summarise(total = sum(passenger_count), .groups = "drop") %>%
sankey_ly(cat_cols = c("operating_airline", "terminal","geo_summary", "geo_region", "activity_type_code"),
num_col = "total",
title = "Dist. of United Airlines Passengers at SFO During 2019")
orca(p, "man/figures/sankey.svg")
```
<img src="man/figures/sankey.svg" width="100%"/>
#### Total number of landing
The total number of landings during the most recent month by `activity_type_code` and `aircraft_manufacturer`:
``` r
sfo_stats %>%
filter(activity_period == 202212,
aircraft_manufacturer != "") %>%
group_by(aircraft_manufacturer) %>%
summarise(total_landing = sum(landing_count),
`.groups` = "drop") %>%
arrange(-total_landing) %>%
plot_ly(labels = ~ aircraft_manufacturer,
values = ~ total_landing) %>%
add_pie(hole = 0.6) %>%
layout(title = "Landing Distribution by Aircraft Manufacturer during Dec 2022")
```
```{r, include=FALSE}
p <- sfo_stats %>%
filter(activity_period == 202212,
aircraft_manufacturer != "") %>%
group_by(aircraft_manufacturer) %>%
summarise(total_landing = sum(landing_count),
`.groups` = "drop") %>%
arrange(-total_landing) %>%
plot_ly(labels = ~ aircraft_manufacturer,
values = ~ total_landing) %>%
add_pie(hole = 0.6) %>%
layout(title = "Landing Distribution by Aircraft Manufacturer During Dec 2022")
orca(p, "man/figures/manufacturer.svg")
```
<img src="man/figures/manufacturer.svg" width="100%"/>
The following Sankey plot demonstrates the distribution of the number of landing in SFO by region and aircraft type, manufacturer, and body type during Dec 2022:
``` r
sfo_stats %>%
filter(activity_period == 202212) %>%
group_by(geo_summary, geo_region, landing_aircraft_type, aircraft_manufacturer, aircraft_body_type) %>%
summarise(total_landing = sum(landing_count),
groups = "drop") %>%
sankey_ly(cat_cols = c("geo_summary", "geo_region",
"landing_aircraft_type",
"aircraft_manufacturer",
"aircraft_body_type"),
num_col = "total_landing",
title = "Landing Summary by Geo Region and Aircraft Type During Dec 2022")
```
```{r, include=FALSE}
p <- sfo_stats %>%
filter(activity_period == 202212) %>%
group_by(geo_region, landing_aircraft_type, aircraft_manufacturer, aircraft_body_type) %>%
summarise(total_landing = sum(landing_count),
groups = "drop") %>%
sankey_ly(cat_cols = c("geo_region",
"landing_aircraft_type",
"aircraft_manufacturer",
"aircraft_body_type"),
num_col = "total_landing",
title = "Landing Summary by Geo Region and Aircraft Type During Dec 2022")
orca(p, "man/figures/landing_sankey.svg")
```
<img src="man/figures/landing_sankey.svg" width="100%"/>