-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0912857
commit 9c4087b
Showing
2 changed files
with
24 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,7 @@ knitr::opts_chunk$set( | |
|
||
<!-- badges: start --> | ||
[![R-CMD-check](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml) | ||
[![CRAN status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
[![CRAN status](https://cranlogs.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/adaR)](https://CRAN.R-project.org/package=adaR) | ||
[![Codecov test coverage](https://codecov.io/gh/gesistsa/adaR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gesistsa/adaR?branch=main) | ||
[![ada-url Version](https://img.shields.io/badge/ada_url-2.9.0-blue)](https://github.com/ada-url/ada) | ||
|
@@ -77,7 +77,7 @@ ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag") | |
*/ | ||
``` | ||
|
||
It solves some problems of urltools with more complex urls. | ||
It solves some problems of urltools with more complex urls. | ||
```{r better} | ||
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. | ||
7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519") | ||
|
@@ -86,7 +86,7 @@ ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.751984 | |
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519") | ||
``` | ||
|
||
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky. | ||
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky. | ||
The performance is still compatible with `urltools::url_parse` with the noted advantage in accuracy in some | ||
practical circumstances. | ||
|
||
|
@@ -102,14 +102,14 @@ For further benchmark results, see `benchmark.md` in `data_raw`. | |
|
||
There are four more groups of functions available to work with url parsing: | ||
|
||
- `ada_get_*()` get a specific component | ||
- `ada_get_*()` get a specific component | ||
- `ada_has_*()` check if a specific component is present | ||
- `ada_set_*()` set a specific component from URLS | ||
- `ada_clear_*()` remove a specific component from URLS | ||
|
||
## Public Suffix extraction | ||
|
||
`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains. | ||
`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains. | ||
|
||
```{r public_suffix} | ||
urls <- c( | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,7 @@ | |
|
||
[![R-CMD-check](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml) | ||
[![CRAN | ||
status](https://www.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
status](https://cranlogs.r-pkg.org/badges/version/adaR)](https://CRAN.R-project.org/package=adaR) | ||
[![CRAN | ||
Downloads](http://cranlogs.r-pkg.org/badges/adaR)](https://CRAN.R-project.org/package=adaR) | ||
[![Codecov test | ||
|
@@ -22,20 +22,20 @@ URL parser written in modern C++ . | |
|
||
It implements several auxilliary functions to work with urls: | ||
|
||
- public suffix extraction (top level domain excluding private | ||
domains) like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (\~40x speedup) | ||
- public suffix extraction (top level domain excluding private domains) | ||
like [psl](https://github.com/hrbrmstr/psl) | ||
- fast c++ implementation of `utils::URLdecode` (~40x speedup) | ||
|
||
More general information on URL parsing can be found in the introductory | ||
vignette via `vignette("adaR")`. | ||
|
||
`adaR` is part of a series of R packages to analyse webtracking data: | ||
|
||
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess | ||
raw webtracking data | ||
- [domainator](https://github.com/schochastics/domainator): classify | ||
domains | ||
- [adaR](https://github.com/gesistsa/adaR): parse urls | ||
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw | ||
webtracking data | ||
- [domainator](https://github.com/schochastics/domainator): classify | ||
domains | ||
- [adaR](https://github.com/gesistsa/adaR): parse urls | ||
|
||
## Installation | ||
|
||
|
@@ -61,10 +61,10 @@ URL. | |
``` r | ||
library(adaR) | ||
ada_url_parse("https://user_1:[email protected]:8080/dir/../api?q=1#frag") | ||
#> href protocol username password | ||
#> 1 https://user_1:[email protected]:8080/api?q=1#frag https: user_1 password_1 | ||
#> host hostname port pathname search hash | ||
#> 1 example.org:8080 example.org 8080 /api ?q=1 #frag | ||
#> href protocol username | ||
#> 1 https://user_1:[email protected]:8080/api?q=1#frag https: user_1 | ||
#> password host hostname port pathname search hash | ||
#> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag | ||
``` | ||
|
||
``` cpp | ||
|
@@ -93,9 +93,7 @@ urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40. | |
#> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 | ||
#> parameter fragment | ||
#> 1 <NA> <NA> | ||
``` | ||
|
||
``` r | ||
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m | ||
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519") | ||
#> href | ||
|
@@ -123,19 +121,19 @@ bench::mark( | |
#> # A tibble: 2 × 6 | ||
#> expression min median `itr/sec` mem_alloc `gc/sec` | ||
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> | ||
#> 1 ada 3.32ms 3.32ms 301. 0B 0 | ||
#> 2 urltools 566.68µs 566.68µs 1765. 0B 0 | ||
#> 1 ada 1.69ms 1.69ms 593. 0B 0 | ||
#> 2 urltools 737.84µs 737.84µs 1355. 0B 0 | ||
``` | ||
|
||
For further benchmark results, see `benchmark.md` in `data_raw`. | ||
|
||
There are four more groups of functions available to work with url | ||
parsing: | ||
|
||
- `ada_get_*()` get a specific component | ||
- `ada_has_*()` check if a specific component is present | ||
- `ada_set_*()` set a specific component from URLS | ||
- `ada_clear_*()` remove a specific component from URLS | ||
- `ada_get_*()` get a specific component | ||
- `ada_has_*()` check if a specific component is present | ||
- `ada_set_*()` set a specific component from URLS | ||
- `ada_clear_*()` remove a specific component from URLS | ||
|
||
## Public Suffix extraction | ||
|
||
|