Skip to content

Commit 4f32167

Browse files
committed
make submission reeady
1 parent d264fa7 commit 4f32167

File tree

6 files changed

+90
-34
lines changed

6 files changed

+90
-34
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@
1010
^_pkgdown\.yml$
1111
^docs$
1212
^pkgdown$
13+
^cran-comments\.md$

DESCRIPTION

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
Package: adaR
2-
Title: A Fast WHATWG-compliant URL Parser
2+
Title: A Fast WHATWG Compliant URL Parser
33
Version: 0.1.0.9000
44
Authors@R:
55
c(person("David", "Schoch", , "[email protected]", role = c("aut", "cre"),
66
comment = c(ORCID = "0000-0003-2952-4812")),
77
person("Chung-hong", "Chan", role = c("aut"), email = "[email protected]",
8-
comment = c(ORCID = "0000-0002-6232-7530"))
8+
comment = c(ORCID = "0000-0002-6232-7530")),
9+
person("Yagiz", "Nizipli", role = c("ctb", "cph"),comment = "author of ada-url : <https://github.com/ada-url/ada>"),
10+
person("Daniel", "Lemire", role = c("ctb", "cph"),comment = "author of ada-url : <https://github.com/ada-url/ada>")
911
)
10-
Description: A wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++. Also contains auxiliary functions to extract public suffix.
12+
Description: A wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++. Also contains auxiliary functions such as a public suffix extractor.
1113
URL: https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR
1214
BugReports: https://github.com/schochastics/adaR/issues
1315
License: MIT + file LICENSE

NEWS.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
# adaR 0.1.0.9000
22

3-
* split C++ files h/t Chung-hong Chan (@chainsawriot)
3+
* split C++ file to isolate original ada-url code h/t Chung-hong Chan (@chainsawriot)
44
* add support for public suffix extraction #14
55
* add support for punycode #18
66
* added `url_decode2` as a fast alternative to `utils::URLdecode`
7-
* improved vectorization of `ada_get_*` and `ada_has_*` #26 and #30 h/t Chung-hong Chan (@chainsawriot)
7+
* improved vectorization of `ada_get_*` and `ada_has_*` #26 and #30 h/t
8+
Chung-hong Chan (@chainsawriot)
9+
* fixed #47 h/t Chung-hong Chan (@chainsawriot)
10+
* added `ada_get_domain()` #43
11+
812

913
# adaR 0.1.0
1014

README.Rmd

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,18 @@ output: github_document
66

77
```{r, include = FALSE}
88
knitr::opts_chunk$set(
9-
collapse = TRUE,
10-
comment = "#>",
11-
fig.path = "man/figures/README-",
12-
out.width = "100%"
9+
collapse = TRUE,
10+
comment = "#>",
11+
fig.path = "man/figures/README-",
12+
out.width = "100%"
1313
)
1414
```
1515

1616
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" />
1717

1818
<!-- badges: start -->
1919
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml)
20+
[![CRAN status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR)
2021
<!-- badges: end -->
2122

2223
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
@@ -27,6 +28,8 @@ It implements several auxilliary functions to work with urls:
2728
- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)
2829
- fast c++ implementation of `utils::URLdecode` (~40x speedup)
2930

31+
More general information on URL parsing can be found in the introductory vignette via `vignette("adaR")`.
32+
3033
`adaR` is part of a series of R packages to analyse webtracking data:
3134

3235
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw webtracking data
@@ -42,9 +45,14 @@ You can install the development version of adaR from [GitHub](https://github.com
4245
devtools::install_github("schochastics/adaR")
4346
```
4447

48+
The version on CRAN can be installed with
49+
```r
50+
install.packages("adaR")
51+
```
52+
4553
## Example
4654

47-
This is a basic example which shows all the returned components of a URL
55+
This is a basic example which shows all the returned components of a URL.
4856

4957
```{r example}
5058
library(adaR)
@@ -75,26 +83,36 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984
7583
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
7684
```
7785

78-
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface
79-
is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some
86+
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky.
87+
The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some
8088
practical circumstances.
8189

8290
```{r faster}
8391
bench::mark(
84-
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
85-
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
86-
iterations = 1, check = FALSE
92+
ada = ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE),
93+
urltools = urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag"),
94+
iterations = 1, check = FALSE
8795
)
8896
```
8997

98+
For further benchmark results, see `benchmark.md` in `data_raw`.
99+
90100
## Public Suffix extraction
91101

92102
`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains.
93-
This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl).
94103

95-
psl relies on a C library and is very fast. However, the package is not on CRAN and has the C library as
96-
system requirement. If these are no issues for you and you need that speed, please use that package.
104+
```{r public_suffix}
105+
urls <- c(
106+
"https://subsub.sub.domain.co.uk",
107+
"https://domain.api.gov.uk",
108+
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
109+
)
110+
public_suffix(urls)
111+
```
112+
113+
If you are wondering about the last url. The list also contains wildcard suffixes such as `*.kawasaki.jp` which need to be matched.
114+
97115

98116
## Acknowledgement
99117

100-
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.
118+
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.

README.md

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
<!-- badges: start -->
77

88
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml)
9+
[![CRAN
10+
status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR)
911
<!-- badges: end -->
1012

1113
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
@@ -18,6 +20,9 @@ It implements several auxilliary functions to work with urls:
1820
like [psl](https://github.com/hrbrmstr/psl)
1921
- fast c++ implementation of `utils::URLdecode` (~40x speedup)
2022

23+
More general information on URL parsing can be found in the introductory
24+
vignette via `vignette("adaR")`.
25+
2126
`adaR` is part of a series of R packages to analyse webtracking data:
2227

2328
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw
@@ -36,9 +41,16 @@ You can install the development version of adaR from
3641
devtools::install_github("schochastics/adaR")
3742
```
3843

44+
The version on CRAN can be installed with
45+
46+
``` r
47+
install.packages("adaR")
48+
```
49+
3950
## Example
4051

41-
This is a basic example which shows all the returned components of a URL
52+
This is a basic example which shows all the returned components of a
53+
URL.
4254

4355
``` r
4456
library(adaR)
@@ -89,36 +101,44 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984
89101
```
90102

91103
A “raw” url parse using ada is extremely fast (see
92-
[ada-url.com](https://www.ada-url.com/)) but the implemented interface
93-
is not yet optimized. The performance is still very compatible with
104+
[ada-url.com](https://www.ada-url.com/)) but for this to carry over to R
105+
is tricky. The performance is still very compatible with
94106
`urltools::url_parse` with the noted advantage in accuracy in some
95107
practical circumstances.
96108

97109
``` r
98110
bench::mark(
99-
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
100-
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
101-
iterations = 1, check = FALSE
111+
ada = ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE),
112+
urltools = urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag"),
113+
iterations = 1, check = FALSE
102114
)
103-
#> Warning: Some expressions had a GC in every iteration; so filtering is
104-
#> disabled.
105115
#> # A tibble: 2 × 6
106116
#> expression min median `itr/sec` mem_alloc `gc/sec`
107117
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
108-
#> 1 ada 456ms 456ms 2.19 2.67MB 19.7
109-
#> 2 urltools 316ms 316ms 3.16 2.59MB 22.1
118+
#> 1 ada 469µs 469µs 2132. 2.49KB 0
119+
#> 2 urltools 407µs 407µs 2457. 2.49KB 0
110120
```
111121

122+
For further benchmark results, see `benchmark.md` in `data_raw`.
123+
112124
## Public Suffix extraction
113125

114126
`public_suffix()` extracts their top level domain from the [public
115127
suffix list](https://publicsuffix.org/), **excluding** private domains.
116-
This functionality already exists in the R package
117-
[psl](https://github.com/hrbrmstr/psl).
118128

119-
psl relies on a C library and is very fast. However, the package is not
120-
on CRAN and has the C library as system requirement. If these are no
121-
issues for you and you need that speed, please use that package.
129+
``` r
130+
urls <- c(
131+
"https://subsub.sub.domain.co.uk",
132+
"https://domain.api.gov.uk",
133+
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
134+
)
135+
public_suffix(urls)
136+
#> [1] "co.uk" "gov.uk"
137+
#> [3] "butthisispartoftheps.kawasaki.jp"
138+
```
139+
140+
If you are wondering about the last url. The list also contains wildcard
141+
suffixes such as `*.kawasaki.jp` which need to be matched.
122142

123143
## Acknowledgement
124144

cran-comments.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Initial Submission
2+
3+
# Test environments
4+
* ubuntu 22.04, R 4.3.1
5+
* win-builder (devel and release)
6+
7+
## R CMD check results
8+
9+
0 errors | 0 warnings | 1 note
10+
11+
* This is a new release.

0 commit comments

Comments
 (0)