From 8add9ad595ec48c31e7f702c2679b409989c70e4 Mon Sep 17 00:00:00 2001 From: schochastics Date: Mon, 25 Sep 2023 13:29:05 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20schochas?= =?UTF-8?q?tics/adaR@cb87fb411d956087dd32589b2ad15e0fab5816d7=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- apple-touch-icon-120x120.png | Bin 24673 -> 24673 bytes apple-touch-icon-152x152.png | Bin 37725 -> 37725 bytes apple-touch-icon-180x180.png | Bin 51025 -> 51025 bytes apple-touch-icon-60x60.png | Bin 7208 -> 7208 bytes apple-touch-icon-76x76.png | Bin 10837 -> 10837 bytes apple-touch-icon.png | Bin 51025 -> 51025 bytes favicon-16x16.png | Bin 1340 -> 1340 bytes favicon-32x32.png | Bin 2692 -> 2692 bytes index.html | 13 +++++++++++-- pkgdown.yml | 2 +- search.json | 2 +- 11 files changed, 13 insertions(+), 4 deletions(-) diff --git a/apple-touch-icon-120x120.png b/apple-touch-icon-120x120.png index 14e536912e049c4e023e6faf9c0b4d4f1ff13bdb..a07c352a996e5a5e7b64d4ad71a6298345bd157a 100644 GIT binary patch delta 64 zcmaEOfbro0#tGHzyfRvs8vdwlY|@JtG`2Fbure{yHZZUpF diff --git a/apple-touch-icon-152x152.png b/apple-touch-icon-152x152.png index 4c9beb88347b5120a948cd041873bfd802cc8b1b..ae55ba85fe8850e267b7e7d05561c72a1fb9d962 100644 GIT binary patch delta 64 zcmcb+jOp$&rU}*TyfWHL+s|y+*rYW{(Adhz!pg*0+rYrez+kpqb;#tslN3-zUpXIs JG5Ou3d;stW7f1jA delta 64 zcmcb+jOp$&rU}*TJXWd=${b5JHfc=~G_o?aurfB+HZZU)NsjjfCl_E1_o9J2I+;bHcsAnOaWC?OectU I@}pz<0Gb>WWB>pF diff --git a/apple-touch-icon-60x60.png b/apple-touch-icon-60x60.png index 98484e0ad713455cccf6c40390a94932371ee803..f3201e9cad4e150b42e9fa58c9f04b2d497ee5ad 100644 GIT binary patch delta 62 zcmZ2svBF|PH9N12hE3mGu8mD!r38(wj4Z57473dltPBiReM>w!xkFk3RrL1io28R? HO6LOrfq4}7 delta 62 zcmZ2svBF|PH9L=$lAQZo_Ki(nr38(vOf9U8O|%URtPBhmTx>WxxkFk3RrF$yMdjq3 G()j>uRuk3$ diff --git a/apple-touch-icon-76x76.png b/apple-touch-icon-76x76.png index 4ac18fed3b0577d84b8c9ceda68239c0afb06da9..8e29e19ff8f11cea7cf45bdf8934f61dcfaf8c0c 100644 GIT binary patch delta 62 zcmcZ_ay4W^H9N12rg!JsIUAdlGzE>Vj4Z5747CjmtPBiz8nbFAZ`D*l6|KE<>EPsN Gn)v{MkQA%{ delta 62 zcmcZ_ay4W^H9L=$a;C%Di5r`gGzE>UOf9U8O|=aStPBj;mUS0S-m0m9Dq8;A`0V6o Gn)v{K?-Wr0 diff --git a/apple-touch-icon.png b/apple-touch-icon.png index befce613a500d6d52b38158a5c51b26f844a280b..c574c9c4530ca39874cfa0009062e7cb0e99f6ae 100644 GIT binary patch delta 64 zcmccE$9%Dmc|tWiuZ*tGlg!qQO>)NsjjfCqdv1_o9J1_#+5vQFN3OaWDNLv-ZM J$&ZfZ0|2!OA09j!YiU0rr delta 64 zcmdnPwTEj%ATv9UmCn!q8yGf6FdH)p8d;fISeck;8yHv_7_1WMyfb+Yivp_XZBN65 I$!A&e0ei(0Q~&?~ diff --git a/favicon-32x32.png b/favicon-32x32.png index 321e0eef76f0b627a9ee9b35e2924ccc4503ccd7..bd3350ba37b4c510a582780d23fe4ac905287e06 100644 GIT binary patch delta 62 zcmZn>Z4sSN&CV-hyzOZ4sSN&CX+`+xMek+Qz0pPC+9pQwu8-Q*8qSD+2?bu1pr? G$_D^eq!K6q diff --git a/index.html b/index.html index fc1786a..80e1d1f 100644 --- a/index.html +++ b/index.html @@ -87,6 +87,15 @@
  • fast c++ implementation of utils::URLdecode (~40x speedup)
  • +

    adaR is part of a series of R packages to analyse webtracking data:

    + +#> 1 ada 456ms 456ms 2.19 2.67MB 19.7 +#> 2 urltools 316ms 316ms 3.16 2.59MB 22.1

    Public Suffix extraction diff --git a/pkgdown.yml b/pkgdown.yml index b2bcd58..7d1b6bf 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -3,7 +3,7 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: adaR: adaR.html -last_built: 2023-09-25T12:58Z +last_built: 2023-09-25T13:28Z urls: reference: https://schochastics.github.io/adaR/reference article: https://schochastics.github.io/adaR/articles diff --git a/search.json b/search.json index e988e46..591e037 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present .","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List (list also includes private top level domains, excluded function). wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched. function implemented base R (avoid extra (system) dependencies) reasonably fast. prefer/need speedy implementation, check psl package wraps C library.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author.","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2023). adaR: fast WHATWG-compliant url parser. https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR.","code":"@Manual{, title = {adaR: A fast WHATWG-compliant url parser}, author = {David Schoch and Chung-hong Chan}, year = {2023}, note = {https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A fast WHATWG-compliant url parser","title":"A fast WHATWG-compliant url parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup)","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A fast WHATWG-compliant url parser","text":"can install development version adaR GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"schochastics/adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A fast WHATWG-compliant url parser","text":"basic example shows returned components URL solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) implemented interface yet optimized. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances.","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = replicate(1000, ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE)), urltools = replicate(1000, urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\")), iterations = 1, check = FALSE ) #> Warning: Some expressions had a GC in every iteration; so filtering is #> disabled. #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 236ms 236ms 4.23 2.67MB 38.1 #> 2 urltools 164ms 164ms 6.10 2.59MB 42.7"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A fast WHATWG-compliant url parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. functionality already exists R package psl. psl relies C library fast. However, package CRAN C library system requirement. issues need speed, please use package.","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A fast WHATWG-compliant url parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" \"NA\""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains — public_suffix","title":"Extract the public suffix from a vector of domains — public_suffix","text":"Extract public suffix vector domains","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains — public_suffix","text":"","code":"public_suffix(url)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains — public_suffix","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-0109000","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0.9000","title":"adaR 0.1.0.9000","text":"split C++ files h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot)","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}] +[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present .","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List (list also includes private top level domains, excluded function). wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched. function implemented base R (avoid extra (system) dependencies) reasonably fast. prefer/need speedy implementation, check psl package wraps C library.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author.","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2023). adaR: fast WHATWG-compliant url parser. https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR.","code":"@Manual{, title = {adaR: A fast WHATWG-compliant url parser}, author = {David Schoch and Chung-hong Chan}, year = {2023}, note = {https://schochastics.github.io/adaR/, https://github.com/schochastics/adaR}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A fast WHATWG-compliant url parser","title":"A fast WHATWG-compliant url parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup) adaR part series R packages analyse webtracking data: webtrackR: preprocess raw webtracking data domainator: classify domains adaR: parse urls","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A fast WHATWG-compliant url parser","text":"can install development version adaR GitHub :","code":"# install.packages(\"devtools\") devtools::install_github(\"schochastics/adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A fast WHATWG-compliant url parser","text":"basic example shows returned components URL solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) implemented interface yet optimized. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances.","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = replicate(1000, ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE)), urltools = replicate(1000, urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\")), iterations = 1, check = FALSE ) #> Warning: Some expressions had a GC in every iteration; so filtering is #> disabled. #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 456ms 456ms 2.19 2.67MB 19.7 #> 2 urltools 316ms 316ms 3.16 2.59MB 22.1"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A fast WHATWG-compliant url parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. functionality already exists R package psl. psl relies C library fast. However, package CRAN C library system requirement. issues need speed, please use package.","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A fast WHATWG-compliant url parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" \"NA\""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains — public_suffix","title":"Extract the public suffix from a vector of domains — public_suffix","text":"Extract public suffix vector domains","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains — public_suffix","text":"","code":"public_suffix(url)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains — public_suffix","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-0109000","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0.9000","title":"adaR 0.1.0.9000","text":"split C++ files h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot)","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}]