Skip to content

Commit

Permalink
Merge pull request #27 from schochastics/README-and-logo
Browse files Browse the repository at this point in the history
Readme and logo
  • Loading branch information
schochastics authored Sep 24, 2023
2 parents a03a202 + c4d6826 commit ea77c8b
Show file tree
Hide file tree
Showing 3 changed files with 120 additions and 12 deletions.
55 changes: 48 additions & 7 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ knitr::opts_chunk$set(
)
```

# adaR
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" />

<!-- badges: start -->
[![R-CMD-check](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/schochastics/adaR/actions/workflows/R-CMD-check.yaml)
Expand All @@ -22,6 +22,11 @@ knitr::opts_chunk$set(
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ .

It implements several auxilliary functions to work with urls:

- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)
- fast c++ implementation of `utils::URLdecode` (~40x speedup)

## Installation

You can install the development version of adaR from [GitHub](https://github.com/) with:
Expand All @@ -33,13 +38,28 @@ devtools::install_github("schochastics/adaR")

## Example

This is a basic example which shows all the returned components
This is a basic example which shows all the returned components of a URL

```{r example}
library(adaR)
ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")
```

```c++
/*
* https://user:[email protected]:1234/foo/bar?baz#quux
* | | | | ^^^^| | |
* | | | | | | | `----- hash_start
* | | | | | | `--------- search_start
* | | | | | `----------------- pathname_start
* | | | | `--------------------- port
* | | | `----------------------- host_end
* | | `---------------------------------- host_start
* | `--------------------------------------- username_end
* `--------------------------------------------- protocol_end
*/
```

It solves some problems of urltools with more complex urls.
```{r better}
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
Expand All @@ -48,13 +68,34 @@ urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
```
<!--
and it is fast
-->
```{r faster, echo=FALSE,eval=FALSE}

A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but the implemented interface
is not yet optimized. The performance is still very compatible with `urltools::url_parse` with the noted advantage in accuracy in some
practical circumstances.

```{r faster}
bench::mark(
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
iterations = 1, check = FALSE
)
```
```

## Public Suffix extraction

`public_suffix()` takes urls and returns their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains.
This functionality already exists in the R package [psl](https://github.com/hrbrmstr/psl) and [urltools](https://cran.r-project.org/package=urltools).

psl relies on a C library an is lighning fast. Hoewver, the package is not on CRAN and has the C lib as a
system requirement. If these are no issues for you and you need that speed, please use that package.

the performance of urltools for that task is quite comparable to psl, but it does rely on a different set of
top level domains (to the best of our knowledge, it does include private domains).

Overall, both packages over higher performance for this task. This comes with no surprise, since
our extractor is written in base R. Public suffix extraction is not the main objective of this package, yet
we wanted to include a function for this task without introducing new dependencies.

## Acknowledgement

The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.
77 changes: 72 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# adaR
# adaR <img src="man/figures/logo.png" align="right" height="139" alt="" />

<!-- badges: start -->

Expand All @@ -12,6 +12,12 @@ adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast
URL parser written in modern C++ .

It implements several auxilliary functions to work with urls:

- public suffix extraction (top level domain excluding private
domains) like [psl](https://github.com/hrbrmstr/psl)
- fast c++ implementation of `utils::URLdecode` (\~40x speedup)

## Installation

You can install the development version of adaR from
Expand All @@ -24,7 +30,7 @@ devtools::install_github("schochastics/adaR")

## Example

This is a basic example which shows all the returned components
This is a basic example which shows all the returned components of a URL

``` r
library(adaR)
Expand All @@ -35,6 +41,21 @@ ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")
#> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag
```

``` cpp
/*
* https://user:[email protected]:1234/foo/bar?baz#quux
* | | | | ^^^^| | |
* | | | | | | | `----- hash_start
* | | | | | | `--------- search_start
* | | | | | `----------------- pathname_start
* | | | | `--------------------- port
* | | | `----------------------- host_end
* | | `---------------------------------- host_start
* | `--------------------------------------- username_end
* `--------------------------------------------- protocol_end
*/
```

It solves some problems of urltools with more complex urls.

``` r
Expand All @@ -59,6 +80,52 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984
#> 1
```

<!--
and it is fast
-->
A “raw” url parse using ada is extremely fast (see
[ada-url.com](https://www.ada-url.com/)) but the implemented interface
is not yet optimized. The performance is still very compatible with
`urltools::url_parse` with the noted advantage in accuracy in some
practical circumstances.

``` r
bench::mark(
ada = replicate(1000, ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag", decode = FALSE)),
urltools = replicate(1000, urltools::url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag")),
iterations = 1, check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 ada 594ms 594ms 1.68 2.67MB 15.1
#> 2 urltools 393ms 393ms 2.55 2.59MB 15.3
```

## Public Suffix extraction

`public_suffix()` takes urls and returns their top level domain from the
[public suffix list](https://publicsuffix.org/), **excluding** private
domains. This functionality already exists in the R package
[psl](https://github.com/hrbrmstr/psl) and
[urltools](https://cran.r-project.org/package=urltools).

psl relies on a C library an is lighning fast. Hoewver, the package is
not on CRAN and has the C lib as a system requirement. If these are no
issues for you and you need that speed, please use that package.

the performance of urltools for that task is quite comparable to psl,
but it does rely on a different set of top level domains (to the best of
our knowledge, it does include private domains).

Overall, both packages over higher performance for this task. This comes
with no surprise, since our extractor is written in base R. Public
suffix extraction is not the main objective of this package, yet we
wanted to include a function for this task without introducing new
dependencies.

## Acknowledgement

The logo is created from [this
portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg)
of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very
early pioneer in Computer Science.
Binary file added man/figures/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ea77c8b

Please sign in to comment.