diff --git a/README.Rmd b/README.Rmd
index c70fe2e..64411ca 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -13,7 +13,7 @@ knitr::opts_chunk$set(
)
```
-# adaR
+# adaR
[](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml)
@@ -22,6 +22,11 @@ knitr::opts_chunk$set(
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ .
+It implements several auxilliary functions to work with urls:
+
+- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)
+- fast c++ implementation of `utils::URLdecode` (~40x speedup)
+
## Installation
You can install the development version of adaR from [GitHub](https://github.com/) with:
@@ -33,13 +38,28 @@ devtools::install_github("schochastics/adaR")
## Example
-This is a basic example which shows all the returned components
+This is a basic example which shows all the returned components of a URL
```{r example}
library(adaR)
ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
```
+```c++
+ /*
+ * https://user:pass@example.com:1234/foo/bar?baz#quux
+ * | | | | ^^^^| | |
+ * | | | | | | | `----- hash_start
+ * | | | | | | `--------- search_start
+ * | | | | | `----------------- pathname_start
+ * | | | | `--------------------- port
+ * | | | `----------------------- host_end
+ * | | `---------------------------------- host_start
+ * | `--------------------------------------- username_end
+ * `--------------------------------------------- protocol_end
+ */
+```
+
It solves some problems of urltools with more complex urls.
```{r better}
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
@@ -48,13 +68,34 @@ urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
```
-
-```{r faster, echo=FALSE,eval=FALSE}
+
+A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface
+is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some
+practical circumstances.
+
+```{r faster}
bench::mark(
ada = replicate(1000, ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE)),
urltools = replicate(1000, urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")),
iterations = 1, check = FALSE
)
-```
+```
+
+## Public Suffix extraction
+
+`public_suffix()` takes urls and returns their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains.
+This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl) and [urltools](https://cran.r-project.org/package=urltools).
+
+psl relies on a C library an is lighning fast. Hoewver, the package is not on CRAN and has the C lib as a
+system requirement. If these are no issues for you and you need that speed, please use that package.
+
+the performance of urltools for that task is quite comparable to psl, but it does rely on a different set of
+top level domains (to the best of our knowledge, it does include private domains).
+
+Overall, both packages over higher performance for this task. This comes with no surprise, since
+our extractor is written in base R. Public suffix extraction is not the main objective of this package, yet
+we wanted to include a function for this task without introducing new dependencies.
+
+## Acknowledgement
+
+The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.
\ No newline at end of file
diff --git a/README.md b/README.md
index 6dab08f..ad33190 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
-# adaR
+# adaR
@@ -12,6 +12,12 @@ adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast
URL parser written in modern C++ .
+It implements several auxilliary functions to work with urls:
+
+- public suffix extraction (top level domain excluding private
+ domains) like [psl](https://github.com/hrbrmstr/psl)
+- fast c++ implementation of `utils::URLdecode` (\~40x speedup)
+
## Installation
You can install the development version of adaR from
@@ -24,7 +30,7 @@ devtools::install_github("schochastics/adaR")
## Example
-This is a basic example which shows all the returned components
+This is a basic example which shows all the returned components of a URL
``` r
library(adaR)
@@ -35,6 +41,21 @@ ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
#> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag
```
+``` cpp
+ /*
+ * https://user:pass@example.com:1234/foo/bar?baz#quux
+ * | | | | ^^^^| | |
+ * | | | | | | | `----- hash_start
+ * | | | | | | `--------- search_start
+ * | | | | | `----------------- pathname_start
+ * | | | | `--------------------- port
+ * | | | `----------------------- host_end
+ * | | `---------------------------------- host_start
+ * | `--------------------------------------- username_end
+ * `--------------------------------------------- protocol_end
+ */
+```
+
It solves some problems of urltools with more complex urls.
``` r
@@ -59,6 +80,52 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984
#> 1
```
-
+A “raw” url parse using ada is extremely fast (see
+[ada-url.com](https://www.ada-url.com/)) but the implemented interface
+is not yet optimized. The performance is still very compatible with
+`urltools::url_parse` with the noted advantage in accuracy in some
+practical circumstances.
+
+``` r
+bench::mark(
+ ada = replicate(1000, ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE)),
+ urltools = replicate(1000, urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")),
+ iterations = 1, check = FALSE
+)
+#> Warning: Some expressions had a GC in every iteration; so filtering is
+#> disabled.
+#> # A tibble: 2 × 6
+#> expression min median `itr/sec` mem_alloc `gc/sec`
+#>
+#> 1 ada 594ms 594ms 1.68 2.67MB 15.1
+#> 2 urltools 393ms 393ms 2.55 2.59MB 15.3
+```
+
+## Public Suffix extraction
+
+`public_suffix()` takes urls and returns their top level domain from the
+[public suffix list](https://publicsuffix.org/), **excluding** private
+domains. This functionality already exists in the R package
+[psl](https://github.com/hrbrmstr/psl) and
+[urltools](https://cran.r-project.org/package=urltools).
+
+psl relies on a C library an is lighning fast. Hoewver, the package is
+not on CRAN and has the C lib as a system requirement. If these are no
+issues for you and you need that speed, please use that package.
+
+the performance of urltools for that task is quite comparable to psl,
+but it does rely on a different set of top level domains (to the best of
+our knowledge, it does include private domains).
+
+Overall, both packages over higher performance for this task. This comes
+with no surprise, since our extractor is written in base R. Public
+suffix extraction is not the main objective of this package, yet we
+wanted to include a function for this task without introducing new
+dependencies.
+
+## Acknowledgement
+
+The logo is created from [this
+portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg)
+of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very
+early pioneer in Computer Science.
diff --git a/man/figures/logo.png b/man/figures/logo.png
new file mode 100644
index 0000000..a8d65a8
Binary files /dev/null and b/man/figures/logo.png differ