-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #27 from schochastics/README-and-logo
Readme and logo
- Loading branch information
Showing
3 changed files
with
120 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,7 @@ knitr::opts_chunk$set( | |
) | ||
``` | ||
|
||
# adaR | ||
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" /> | ||
|
||
<!-- badges: start --> | ||
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml) | ||
|
@@ -22,6 +22,11 @@ knitr::opts_chunk$set( | |
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a | ||
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ . | ||
|
||
It implements several auxilliary functions to work with urls: | ||
|
||
- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (~40x speedup) | ||
|
||
## Installation | ||
|
||
You can install the development version of adaR from [GitHub](https://github.com/) with: | ||
|
@@ -33,13 +38,28 @@ devtools::install_github("schochastics/adaR") | |
|
||
## Example | ||
|
||
This is a basic example which shows all the returned components | ||
This is a basic example which shows all the returned components of a URL | ||
|
||
```{r example} | ||
library(adaR) | ||
ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag") | ||
``` | ||
|
||
```c++ | ||
/* | ||
* https://user:[email protected]:1234/foo/bar?baz#quux | ||
* | | | | ^^^^| | | | ||
* | | | | | | | `----- hash_start | ||
* | | | | | | `--------- search_start | ||
* | | | | | `----------------- pathname_start | ||
* | | | | `--------------------- port | ||
* | | | `----------------------- host_end | ||
* | | `---------------------------------- host_start | ||
* | `--------------------------------------- username_end | ||
* `--------------------------------------------- protocol_end | ||
*/ | ||
``` | ||
|
||
It solves some problems of urltools with more complex urls. | ||
```{r better} | ||
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. | ||
|
@@ -48,13 +68,34 @@ urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40. | |
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m | ||
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519") | ||
``` | ||
<!-- | ||
and it is fast | ||
--> | ||
```{r faster, echo=FALSE,eval=FALSE} | ||
|
||
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface | ||
is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some | ||
practical circumstances. | ||
|
||
```{r faster} | ||
bench::mark( | ||
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)), | ||
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")), | ||
iterations = 1, check = FALSE | ||
) | ||
``` | ||
``` | ||
|
||
## Public Suffix extraction | ||
|
||
`public_suffix()` takes urls and returns their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains. | ||
This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl) and [urltools](https://cran.r-project.org/package=urltools). | ||
|
||
psl relies on a C library an is lighning fast. Hoewver, the package is not on CRAN and has the C lib as a | ||
system requirement. If these are no issues for you and you need that speed, please use that package. | ||
|
||
the performance of urltools for that task is quite comparable to psl, but it does rely on a different set of | ||
top level domains (to the best of our knowledge, it does include private domains). | ||
|
||
Overall, both packages over higher performance for this task. This comes with no surprise, since | ||
our extractor is written in base R. Public suffix extraction is not the main objective of this package, yet | ||
we wanted to include a function for this task without introducing new dependencies. | ||
|
||
## Acknowledgement | ||
|
||
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
|
||
# adaR | ||
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" /> | ||
|
||
<!-- badges: start --> | ||
|
||
|
@@ -12,6 +12,12 @@ adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a | |
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast | ||
URL parser written in modern C++ . | ||
|
||
It implements several auxilliary functions to work with urls: | ||
|
||
- public suffix extraction (top level domain excluding private | ||
domains) like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (\~40x speedup) | ||
|
||
## Installation | ||
|
||
You can install the development version of adaR from | ||
|
@@ -24,7 +30,7 @@ devtools::install_github("schochastics/adaR") | |
|
||
## Example | ||
|
||
This is a basic example which shows all the returned components | ||
This is a basic example which shows all the returned components of a URL | ||
|
||
``` r | ||
library(adaR) | ||
|
@@ -35,6 +41,21 @@ ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag") | |
#> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag | ||
``` | ||
|
||
``` cpp | ||
/* | ||
* https://user:[email protected]:1234/foo/bar?baz#quux | ||
* | | | | ^^^^| | | | ||
* | | | | | | | `----- hash_start | ||
* | | | | | | `--------- search_start | ||
* | | | | | `----------------- pathname_start | ||
* | | | | `--------------------- port | ||
* | | | `----------------------- host_end | ||
* | | `---------------------------------- host_start | ||
* | `--------------------------------------- username_end | ||
* `--------------------------------------------- protocol_end | ||
*/ | ||
``` | ||
|
||
It solves some problems of urltools with more complex urls. | ||
|
||
``` r | ||
|
@@ -59,6 +80,52 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984 | |
#> 1 | ||
``` | ||
|
||
<!-- | ||
and it is fast | ||
--> | ||
A “raw” url parse using ada is extremely fast (see | ||
[ada-url.com](https://www.ada-url.com/)) but the implemented interface | ||
is not yet optimized. The performance is still very compatible with | ||
`urltools::url_parse` with the noted advantage in accuracy in some | ||
practical circumstances. | ||
|
||
``` r | ||
bench::mark( | ||
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)), | ||
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")), | ||
iterations = 1, check = FALSE | ||
) | ||
#> Warning: Some expressions had a GC in every iteration; so filtering is | ||
#> disabled. | ||
#> # A tibble: 2 × 6 | ||
#> expression min median `itr/sec` mem_alloc `gc/sec` | ||
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> | ||
#> 1 ada 594ms 594ms 1.68 2.67MB 15.1 | ||
#> 2 urltools 393ms 393ms 2.55 2.59MB 15.3 | ||
``` | ||
|
||
## Public Suffix extraction | ||
|
||
`public_suffix()` takes urls and returns their top level domain from the | ||
[public suffix list](https://publicsuffix.org/), **excluding** private | ||
domains. This functionality already exists in the R package | ||
[psl](https://github.com/hrbrmstr/psl) and | ||
[urltools](https://cran.r-project.org/package=urltools). | ||
|
||
psl relies on a C library an is lighning fast. Hoewver, the package is | ||
not on CRAN and has the C lib as a system requirement. If these are no | ||
issues for you and you need that speed, please use that package. | ||
|
||
the performance of urltools for that task is quite comparable to psl, | ||
but it does rely on a different set of top level domains (to the best of | ||
our knowledge, it does include private domains). | ||
|
||
Overall, both packages over higher performance for this task. This comes | ||
with no surprise, since our extractor is written in base R. Public | ||
suffix extraction is not the main objective of this package, yet we | ||
wanted to include a function for this task without introducing new | ||
dependencies. | ||
|
||
## Acknowledgement | ||
|
||
The logo is created from [this | ||
portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) | ||
of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very | ||
early pioneer in Computer Science. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.