-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d264fa7
commit 4f32167
Showing
6 changed files
with
90 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,3 +10,4 @@ | |
^_pkgdown\.yml$ | ||
^docs$ | ||
^pkgdown$ | ||
^cran-comments\.md$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,15 @@ | ||
Package: adaR | ||
Title: A Fast WHATWG-compliant URL Parser | ||
Title: A Fast WHATWG Compliant URL Parser | ||
Version: 0.1.0.9000 | ||
Authors@R: | ||
c(person("David", "Schoch", , "[email protected]", role = c("aut", "cre"), | ||
comment = c(ORCID = "0000-0003-2952-4812")), | ||
person("Chung-hong", "Chan", role = c("aut"), email = "[email protected]", | ||
comment = c(ORCID = "0000-0002-6232-7530")) | ||
comment = c(ORCID = "0000-0002-6232-7530")), | ||
person("Yagiz", "Nizipli", role = c("ctb", "cph"),comment = "author of ada-url : <https://github.com/ada-url/ada>"), | ||
person("Daniel", "Lemire", role = c("ctb", "cph"),comment = "author of ada-url : <https://github.com/ada-url/ada>") | ||
) | ||
Description: A wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++. Also contains auxiliary functions to extract public suffix. | ||
Description: A wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++. Also contains auxiliary functions such as a public suffix extractor. | ||
URL: https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR | ||
BugReports: https://github.com/schochastics/adaR/issues | ||
License: MIT + file LICENSE | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,17 +6,18 @@ output: github_document | |
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
fig.path = "man/figures/README-", | ||
out.width = "100%" | ||
collapse = TRUE, | ||
comment = "#>", | ||
fig.path = "man/figures/README-", | ||
out.width = "100%" | ||
) | ||
``` | ||
|
||
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" /> | ||
|
||
<!-- badges: start --> | ||
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml) | ||
[![CRAN status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
<!-- badges: end --> | ||
|
||
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a | ||
|
@@ -27,6 +28,8 @@ It implements several auxilliary functions to work with urls: | |
- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (~40x speedup) | ||
|
||
More general information on URL parsing can be found in the introductory vignette via `vignette("adaR")`. | ||
|
||
`adaR` is part of a series of R packages to analyse webtracking data: | ||
|
||
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw webtracking data | ||
|
@@ -42,9 +45,14 @@ You can install the development version of adaR from [GitHub](https://github.com | |
devtools::install_github("schochastics/adaR") | ||
``` | ||
|
||
The version on CRAN can be installed with | ||
```r | ||
install.packages("adaR") | ||
``` | ||
|
||
## Example | ||
|
||
This is a basic example which shows all the returned components of a URL | ||
This is a basic example which shows all the returned components of a URL. | ||
|
||
```{r example} | ||
library(adaR) | ||
|
@@ -75,26 +83,36 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984 | |
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519") | ||
``` | ||
|
||
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface | ||
is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some | ||
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky. | ||
The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some | ||
practical circumstances. | ||
|
||
```{r faster} | ||
bench::mark( | ||
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)), | ||
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")), | ||
iterations = 1, check = FALSE | ||
ada = ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE), | ||
urltools = urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag"), | ||
iterations = 1, check = FALSE | ||
) | ||
``` | ||
|
||
For further benchmark results, see `benchmark.md` in `data_raw`. | ||
|
||
## Public Suffix extraction | ||
|
||
`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains. | ||
This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl). | ||
|
||
psl relies on a C library and is very fast. However, the package is not on CRAN and has the C library as | ||
system requirement. If these are no issues for you and you need that speed, please use that package. | ||
```{r public_suffix} | ||
urls <- c( | ||
"https://subsub.sub.domain.co.uk", | ||
"https://domain.api.gov.uk", | ||
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp" | ||
) | ||
public_suffix(urls) | ||
``` | ||
|
||
If you are wondering about the last url. The list also contains wildcard suffixes such as `*.kawasaki.jp` which need to be matched. | ||
|
||
|
||
## Acknowledgement | ||
|
||
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science. | ||
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,8 @@ | |
<!-- badges: start --> | ||
|
||
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml) | ||
[![CRAN | ||
status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
<!-- badges: end --> | ||
|
||
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a | ||
|
@@ -18,6 +20,9 @@ It implements several auxilliary functions to work with urls: | |
like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (~40x speedup) | ||
|
||
More general information on URL parsing can be found in the introductory | ||
vignette via `vignette("adaR")`. | ||
|
||
`adaR` is part of a series of R packages to analyse webtracking data: | ||
|
||
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw | ||
|
@@ -36,9 +41,16 @@ You can install the development version of adaR from | |
devtools::install_github("schochastics/adaR") | ||
``` | ||
|
||
The version on CRAN can be installed with | ||
|
||
``` r | ||
install.packages("adaR") | ||
``` | ||
|
||
## Example | ||
|
||
This is a basic example which shows all the returned components of a URL | ||
This is a basic example which shows all the returned components of a | ||
URL. | ||
|
||
``` r | ||
library(adaR) | ||
|
@@ -89,36 +101,44 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984 | |
``` | ||
|
||
A “raw” url parse using ada is extremely fast (see | ||
[ada-url.com](https://www.ada-url.com/)) but the implemented interface | ||
is not yet optimized. The performance is still very compatible with | ||
[ada-url.com](https://www.ada-url.com/)) but for this to carry over to R | ||
is tricky. The performance is still very compatible with | ||
`urltools::url_parse` with the noted advantage in accuracy in some | ||
practical circumstances. | ||
|
||
``` r | ||
bench::mark( | ||
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)), | ||
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")), | ||
iterations = 1, check = FALSE | ||
ada = ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE), | ||
urltools = urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag"), | ||
iterations = 1, check = FALSE | ||
) | ||
#> Warning: Some expressions had a GC in every iteration; so filtering is | ||
#> disabled. | ||
#> # A tibble: 2 × 6 | ||
#> expression min median `itr/sec` mem_alloc `gc/sec` | ||
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> | ||
#> 1 ada 456ms 456ms 2.19 2.67MB 19.7 | ||
#> 2 urltools 316ms 316ms 3.16 2.59MB 22.1 | ||
#> 1 ada 469µs 469µs 2132. 2.49KB 0 | ||
#> 2 urltools 407µs 407µs 2457. 2.49KB 0 | ||
``` | ||
|
||
For further benchmark results, see `benchmark.md` in `data_raw`. | ||
|
||
## Public Suffix extraction | ||
|
||
`public_suffix()` extracts their top level domain from the [public | ||
suffix list](https://publicsuffix.org/), **excluding** private domains. | ||
This functionality already exists in the R package | ||
[psl](https://github.com/hrbrmstr/psl). | ||
|
||
psl relies on a C library and is very fast. However, the package is not | ||
on CRAN and has the C library as system requirement. If these are no | ||
issues for you and you need that speed, please use that package. | ||
``` r | ||
urls <- c( | ||
"https://subsub.sub.domain.co.uk", | ||
"https://domain.api.gov.uk", | ||
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp" | ||
) | ||
public_suffix(urls) | ||
#> [1] "co.uk" "gov.uk" | ||
#> [3] "butthisispartoftheps.kawasaki.jp" | ||
``` | ||
|
||
If you are wondering about the last url. The list also contains wildcard | ||
suffixes such as `*.kawasaki.jp` which need to be matched. | ||
|
||
## Acknowledgement | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
## Initial Submission | ||
|
||
# Test environments | ||
* ubuntu 22.04, R 4.3.1 | ||
* win-builder (devel and release) | ||
|
||
## R CMD check results | ||
|
||
0 errors | 0 warnings | 1 note | ||
|
||
* This is a new release. |