diff --git a/apple-touch-icon-120x120.png b/apple-touch-icon-120x120.png index 14e5369..a07c352 100644 Binary files a/apple-touch-icon-120x120.png and b/apple-touch-icon-120x120.png differ diff --git a/apple-touch-icon-152x152.png b/apple-touch-icon-152x152.png index 4c9beb8..ae55ba8 100644 Binary files a/apple-touch-icon-152x152.png and b/apple-touch-icon-152x152.png differ diff --git a/apple-touch-icon-180x180.png b/apple-touch-icon-180x180.png index de0579e..94111c5 100644 Binary files a/apple-touch-icon-180x180.png and b/apple-touch-icon-180x180.png differ diff --git a/apple-touch-icon-60x60.png b/apple-touch-icon-60x60.png index 98484e0..f3201e9 100644 Binary files a/apple-touch-icon-60x60.png and b/apple-touch-icon-60x60.png differ diff --git a/apple-touch-icon-76x76.png b/apple-touch-icon-76x76.png index 4ac18fe..8e29e19 100644 Binary files a/apple-touch-icon-76x76.png and b/apple-touch-icon-76x76.png differ diff --git a/apple-touch-icon.png b/apple-touch-icon.png index befce61..c574c9c 100644 Binary files a/apple-touch-icon.png and b/apple-touch-icon.png differ diff --git a/favicon-16x16.png b/favicon-16x16.png index 61f498b..37c7187 100644 Binary files a/favicon-16x16.png and b/favicon-16x16.png differ diff --git a/favicon-32x32.png b/favicon-32x32.png index 321e0ee..bd3350b 100644 Binary files a/favicon-32x32.png and b/favicon-32x32.png differ diff --git a/index.html b/index.html index fc1786a..80e1d1f 100644 --- a/index.html +++ b/index.html @@ -87,6 +87,15 @@
  • fast c++ implementation of utils::URLdecode (~40x speedup)
  • +

    adaR is part of a series of R packages to analyse webtracking data:

    +

    Installation

    @@ -151,8 +160,8 @@

    Example#> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> -#> 1 ada 236ms 236ms 4.23 2.67MB 38.1 -#> 2 urltools 164ms 164ms 6.10 2.59MB 42.7

    +#> 1 ada 456ms 456ms 2.19 2.67MB 19.7 +#> 2 urltools 316ms 316ms 3.16 2.59MB 22.1

    Public Suffix extraction diff --git a/pkgdown.yml b/pkgdown.yml index b2bcd58..7d1b6bf 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -3,7 +3,7 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: adaR: adaR.html -last_built: 2023-09-25T12:58Z +last_built: 2023-09-25T13:28Z urls: reference: https://schochastics.github.io/adaR/reference article: https://schochastics.github.io/adaR/articles diff --git a/search.json b/search.json index e988e46..591e037 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present .","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List (list also includes private top level domains, excluded function). wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched. function implemented base R (avoid extra (system) dependencies) reasonably fast. prefer/need speedy implementation, check psl package wraps C library.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author.","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2023). adaR: fast WHATWG-compliant url parser. https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR.","code":"@Manual{, title = {adaR: A fast WHATWG-compliant url parser}, author = {David Schoch and Chung-hong Chan}, year = {2023}, note = {https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A fast WHATWG-compliant url parser","title":"A fast WHATWG-compliant url parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup)","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A fast WHATWG-compliant url parser","text":"can install development version adaR GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"schochastics/adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A fast WHATWG-compliant url parser","text":"basic example shows returned components URL solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) implemented interface yet optimized. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances.","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = replicate(1000, ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE)), urltools = replicate(1000, urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\")), iterations = 1, check = FALSE ) #> Warning: Some expressions had a GC in every iteration; so filtering is #> disabled. #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 236ms 236ms 4.23 2.67MB 38.1 #> 2 urltools 164ms 164ms 6.10 2.59MB 42.7"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A fast WHATWG-compliant url parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. functionality already exists R package psl. psl relies C library fast. However, package CRAN C library system requirement. issues need speed, please use package.","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A fast WHATWG-compliant url parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" \"NA\""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains — public_suffix","title":"Extract the public suffix from a vector of domains — public_suffix","text":"Extract public suffix vector domains","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains — public_suffix","text":"","code":"public_suffix(url)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains — public_suffix","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-0109000","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0.9000","title":"adaR 0.1.0.9000","text":"split C++ files h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot)","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}] +[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present .","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List (list also includes private top level domains, excluded function). wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched. function implemented base R (avoid extra (system) dependencies) reasonably fast. prefer/need speedy implementation, check psl package wraps C library.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author.","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2023). adaR: fast WHATWG-compliant url parser. https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR.","code":"@Manual{, title = {adaR: A fast WHATWG-compliant url parser}, author = {David Schoch and Chung-hong Chan}, year = {2023}, note = {https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A fast WHATWG-compliant url parser","title":"A fast WHATWG-compliant url parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup) adaR part series R packages analyse webtracking data: webtrackR: preprocess raw webtracking data domainator: classify domains adaR: parse urls","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A fast WHATWG-compliant url parser","text":"can install development version adaR GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"schochastics/adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A fast WHATWG-compliant url parser","text":"basic example shows returned components URL solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) implemented interface yet optimized. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances.","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = replicate(1000, ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE)), urltools = replicate(1000, urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\")), iterations = 1, check = FALSE ) #> Warning: Some expressions had a GC in every iteration; so filtering is #> disabled. #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 456ms 456ms 2.19 2.67MB 19.7 #> 2 urltools 316ms 316ms 3.16 2.59MB 22.1"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A fast WHATWG-compliant url parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. functionality already exists R package psl. psl relies C library fast. However, package CRAN C library system requirement. issues need speed, please use package.","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A fast WHATWG-compliant url parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" \"NA\""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains — public_suffix","title":"Extract the public suffix from a vector of domains — public_suffix","text":"Extract public suffix vector domains","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains — public_suffix","text":"","code":"public_suffix(url)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains — public_suffix","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-0109000","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0.9000","title":"adaR 0.1.0.9000","text":"split C++ files h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot)","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}]