Merge pull request #27 from schochastics/README-and-logo

Readme and logo
gesistsa · Sep 24, 2023 · ea77c8b · ea77c8b
2 parents a03a202 + c4d6826
commit ea77c8b
Show file tree

Hide file tree

Showing 3 changed files with 120 additions and 12 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -13,7 +13,7 @@ knitr::opts_chunk$set(
 )
 ```
 
-# adaR
+# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" />
 
 <!-- badges: start -->
 [![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml)
@@ -22,6 +22,11 @@ knitr::opts_chunk$set(
 adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
 [WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ .
 
+It implements several auxilliary functions to work with urls:
+
+- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)
+- fast c++ implementation of `utils::URLdecode` (~40x speedup)
+
 ## Installation
 
 You can install the development version of adaR from [GitHub](https://github.com/) with:
@@ -33,13 +38,28 @@ devtools::install_github("schochastics/adaR")
 
 ## Example
 
-This is a basic example which shows all the returned components
+This is a basic example which shows all the returned components of a URL
 
 ```{r example}
 library(adaR)
 ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")
 ```
 
+```c++
+  /*
+   * https://user:[email protected]:1234/foo/bar?baz#quux
+   *       |     |    |          | ^^^^|       |   |
+   *       |     |    |          | |   |       |   `----- hash_start
+   *       |     |    |          | |   |       `--------- search_start
+   *       |     |    |          | |   `----------------- pathname_start
+   *       |     |    |          | `--------------------- port
+   *       |     |    |          `----------------------- host_end
+   *       |     |    `---------------------------------- host_start
+   *       |     `--------------------------------------- username_end
+   *       `--------------------------------------------- protocol_end
+   */
+```
+
 It solves some problems of urltools with more complex urls. 
 ```{r better}
 urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
@@ -48,13 +68,34 @@ urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.
 ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
    5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
 ```
-<!-- 
-and it is fast
--->
-```{r faster, echo=FALSE,eval=FALSE}
+
+A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface
+is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some
+practical circumstances.
+
+```{r faster}
 bench::mark(
   ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
   urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
   iterations = 1, check = FALSE
 )
-``` 
+```
+
+## Public Suffix extraction
+
+`public_suffix()` takes urls and returns their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains. 
+This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl) and [urltools](https://cran.r-project.org/package=urltools).
+
+psl relies on a C library an is lighning fast. Hoewver, the package is not on CRAN and has the C lib as a 
+system requirement. If these are no issues for you and you need that speed, please use that package. 
+
+the performance of urltools for that task is quite comparable to psl, but it does rely on a different set of 
+top level domains (to the best of our knowledge, it does include private domains). 
+
+Overall, both packages over higher performance for this task. This comes with no surprise, since
+our extractor is written in base R. Public suffix extraction is not the main objective of this package, yet
+we wanted to include a function for this task without introducing new dependencies.
+
+## Acknowledgement
+
+The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-# adaR
+# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" />
 
 <!-- badges: start -->
 
@@ -12,6 +12,12 @@ adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
 [WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast
 URL parser written in modern C++ .
 
+It implements several auxilliary functions to work with urls:
+
+-   public suffix extraction (top level domain excluding private
+    domains) like [psl](https://github.com/hrbrmstr/psl)
+-   fast c++ implementation of `utils::URLdecode` (\~40x speedup)
+
 ## Installation
 
 You can install the development version of adaR from
@@ -24,7 +30,7 @@ devtools::install_github("schochastics/adaR")
 
 ## Example
 
-This is a basic example which shows all the returned components
+This is a basic example which shows all the returned components of a URL
 
 ``` r
 library(adaR)
@@ -35,6 +41,21 @@ ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")
 #> 1 password_1 example.org:8080 example.org 8080     /api   ?q=1 #frag
 ```
 
+``` cpp
+  /*
+   * https://user:[email protected]:1234/foo/bar?baz#quux
+   *       |     |    |          | ^^^^|       |   |
+   *       |     |    |          | |   |       |   `----- hash_start
+   *       |     |    |          | |   |       `--------- search_start
+   *       |     |    |          | |   `----------------- pathname_start
+   *       |     |    |          | `--------------------- port
+   *       |     |    |          `----------------------- host_end
+   *       |     |    `---------------------------------- host_start
+   *       |     `--------------------------------------- username_end
+   *       `--------------------------------------------- protocol_end
+   */
+```
+
 It solves some problems of urltools with more complex urls.
 
 ``` r
@@ -59,6 +80,52 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984
 #> 1
 ```
 
-<!-- 
-and it is fast
--->
+A “raw” url parse using ada is extremely fast (see
+[ada-url.com](https://www.ada-url.com/)) but the implemented interface
+is not yet optimized. The performance is still very compatible with
+`urltools::url_parse` with the noted advantage in accuracy in some
+practical circumstances.
+
+``` r
+bench::mark(
+  ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
+  urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
+  iterations = 1, check = FALSE
+)
+#> Warning: Some expressions had a GC in every iteration; so filtering is
+#> disabled.
+#> # A tibble: 2 × 6
+#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
+#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
+#> 1 ada           594ms    594ms      1.68    2.67MB     15.1
+#> 2 urltools      393ms    393ms      2.55    2.59MB     15.3
+```
+
+## Public Suffix extraction
+
+`public_suffix()` takes urls and returns their top level domain from the
+[public suffix list](https://publicsuffix.org/), **excluding** private
+domains. This functionality already exists in the R package
+[psl](https://github.com/hrbrmstr/psl) and
+[urltools](https://cran.r-project.org/package=urltools).
+
+psl relies on a C library an is lighning fast. Hoewver, the package is
+not on CRAN and has the C lib as a system requirement. If these are no
+issues for you and you need that speed, please use that package.
+
+the performance of urltools for that task is quite comparable to psl,
+but it does rely on a different set of top level domains (to the best of
+our knowledge, it does include private domains).
+
+Overall, both packages over higher performance for this task. This comes
+with no surprise, since our extractor is written in base R. Public
+suffix extraction is not the main objective of this package, yet we
+wanted to include a function for this task without introducing new
+dependencies.
+
+## Acknowledgement
+
+The logo is created from [this
+portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg)
+of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very
+early pioneer in Computer Science.
diff --git a/man/figures/logo.png b/man/figures/logo.png