-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
150 lines (111 loc) · 5.69 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
output: github_document
editor_options:
markdown:
wrap: sentence
canonical: true
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
cache = TRUE
)
library(notaurin)
```
# notaurin
<!-- badges: start -->
[![R-CMD-check](https://github.com/asiripanich/aurin/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/asiripanich/aurin/actions/workflows/R-CMD-check.yaml) ![](https://www.r-pkg.org/badges/version-ago/notaurin) [![](https://cranlogs.r-pkg.org/badges/notaurin)](https://cran.r-project.org/package=notaurin) [![](https://cranlogs.r-pkg.org/badges/grand-total/notaurin)](https://CRAN.R-project.org/package=notaurin)
<!-- badges: end -->
<p align="center">
<img src="https://aurin.org.au/wp-content/uploads/2018/07/aurin-logo-400.png" />
</p>
<p align="center">
🚧 <b>Warning! The `notaurin` package is not affiliated with AURIN in any way.</b> 🚧
<br>
</p>
The official AURIN R tutorial can be found [here](https://aurin.org.au/resources/training/explore-r/).
Want to know a more convenient way to access the AURIN data portal from R which AURIN simply doesn't seem to offer? Maybe give {notaurin} a try. 😬
The goal of {notaurin} is to provide an easy way for R users to access **MORE THAN 5000 OPEN DATASETS** on [AURIN](https://aurin.org.au/) using their [Data Portal](https://data.aurin.org.au/).
You can request an API key from:
> <https://aurin.org.au/resources/einfrastructure/>
**AURIN** is "*Australia’s :kangaroo: single largest resource for accessing clean, integrated, spatially enabled and research-ready data on issues surrounding health and wellbeing, socio-economic metrics, transportation, and land-use.*"
## Installation
Here are ways you can install `notaurin`:
``` r
# from CRAN for the latest version
install.packages("notaurin")
# from GitHub for the latest development version
install.packages("remotes")
remotes::install_github("asiripanich/notaurin")
```
This package requires the [sf](https://github.com/r-spatial/sf) package.
Please see the sf package's [GitHub page](https://github.com/r-spatial/sf) to install its non R dependencies.
## Example
First, you must add your [AURIN API username and password](https://aurin.org.au/resources/aurin-apis/sign-up/) as an R environment variable to your `.Renviron` file.
`notaurin` provides `aur_register()` function to help you with this step.
If you choose to set `add_to_renviron = TRUE` you won't need to run this step again on current machine after you restart your R session.
``` r
library(notaurin)
# add_to_renviron = TRUE, so you won't need to run this step again on current machine.
aur_register(username = "your-username", password = "your-password", add_to_renviron = T)
```
`aur_browse()` opens [the data catalogue of AURIN](https://data.aurin.org.au/dataset) on your default browser.
``` r
aur_browse()
```
Identify the '**AURIN Open API ID**' field on the 'Additional Info' table of the dataset that you want to download.
For example, for this [public toilet 2017 dataset](https://data.aurin.org.au/dataset/au-govt-dss-national-public-toilets-2017-na) its '**AURIN Open API ID**' field is `"aurin:datasource-UQ_ERG-UoM_AURIN_DB_public_toilets"`.
> Note that, some datasets on AURIN may not have '**AURIN Open API ID**', meaning that it cannot be downloaded via their API.
Alternatively, you may use `aur_meta` to search datasets without leaving your R console.
```{r}
meta <- aur_meta()
# print out the first five rows
knitr::kable(head(meta))
```
Use `aur_get()` to download the dataset.
```{r}
# download this public toilet dataset.
open_api_id <- "datasource-AU_Govt_DSS-UoM_AURIN:national_public_toilets_2017"
public_toilets <- aur_get(open_api_id = open_api_id)
state_polygons <- aur_get(open_api_id = "datasource-AU_Govt_ABS-UoM_AURIN_DB_GeoLevel:ste_2016_aust")
state_polygons <- state_polygons[state_polygons$state_code_2016 %in% 1:8, ]
```
Let's visualise the data using the `ggplot2` package.
``` r
# If you don't have the package you can install it with `install.packages("ggplot2")`.
library(ggplot2)
ggplot(public_toilets) +
geom_sf(data = state_polygons, fill = "antiquewhite") +
geom_sf(alpha = 0.05, aes(color = status)) +
labs(title = "Public toilets in Australia, 2017") +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
guides(colour = guide_legend(override.aes = list(alpha = 1))) +
theme(panel.background = element_rect(fill = "aliceblue"))
```
<img src="man/figures/README-example-public-toilet-plot-1.png" width="100%" />
See [here](https://data.aurin.org.au/group/aurin-api) to find available datasets.
## Download multiple datasets in parallel
When there are many datasets that you need to download, you may want to put all of your CPUs to work.
The code chucks below show how you can download multiple datasets in parallel using the {furrr} and {future} packages.
First, setup the workers - this affects how many datasets you can download in parallel at the same time.
The maximum number of workers of your machine can be determined using `future::availableCores()`.
```{r}
library(furrr)
library(future)
future::plan(future::multiprocess, workers = 2)
```
Let's assume you want the *first 10 rows* of all datasets on AURIN with the word "toilet" in their title.
```{r}
knitr::kable(meta[grepl("toilet", meta$title, ignore.case = T), ])
```
Extract their AURIN open API ids and download all of them in parallel.
```{r, warning=FALSE}
toilet_datasets_ids <- meta$aurin_open_api_id[grepl("toilet", meta$title, ignore.case = T)]
data_lst <- furrr::future_map(toilet_datasets_ids, ~ aur_get(.x, params = list(maxFeatures = 10)))
data_lst
```