From 402e644f4ccd7981421afa68f6e61cf4aec200eb Mon Sep 17 00:00:00 2001 From: schochastics Date: Thu, 1 Feb 2024 13:43:01 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20gesistsa?= =?UTF-8?q?/adaR@3e62518c591baddd95448d3253d44e804e36c4cb=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- apple-touch-icon-120x120.png | Bin 24673 -> 24673 bytes apple-touch-icon-152x152.png | Bin 37725 -> 37725 bytes apple-touch-icon-180x180.png | Bin 51025 -> 51025 bytes apple-touch-icon-60x60.png | Bin 7208 -> 7208 bytes apple-touch-icon-76x76.png | Bin 10837 -> 10837 bytes apple-touch-icon.png | Bin 51025 -> 51025 bytes favicon-16x16.png | Bin 1340 -> 1340 bytes favicon-32x32.png | Bin 2692 -> 2692 bytes index.html | 6 +++--- pkgdown.yml | 2 +- search.json | 2 +- 11 files changed, 5 insertions(+), 5 deletions(-) diff --git a/apple-touch-icon-120x120.png b/apple-touch-icon-120x120.png index eca280f367a9de8e9b3d8daeae9042628f699ba6..56a1b5a812b0c66446ea4249acc1debc36b5bca8 100644 GIT binary patch delta 78 zcmaEOfbro0#tD_2OpLr*8q0QmIK8n+BVOD{*T68u(Adhv$jZb(+rYrez+je>Q~uFG0#ETp18XJZf8Csc`SQ(mW8yHv_7zE69IX(Gc Ryc9M$#RGrJCx4312LQgS7=Hi& diff --git a/apple-touch-icon-152x152.png b/apple-touch-icon-152x152.png index 7cb0c1360293a84eb49e379d5ce8743c0e72ccd3..a99fdb47e144870d54d660213707c3eea229f013 100644 GIT binary patch delta 78 zcmcb+jOp$&rU{jtOpLr*S{~acKHu1+GD+M>*T68u(Adhv2#B-|46F)! lag8WRNi0dV%FR#7OsixtGB7gHHNc|blIh;HlXptz0{~MJ8~p$P delta 97 zcmZ2svBF|PB`2f2s19%37VUi-o8C)#80s1uh8P)InV47^8fzOESQ!{FEzQnhU|>)! lag8WRNi0dV%FR#7OsixtGB7gHHNc{w_{Zf7lXptz0|0Ys9Sr~g diff --git a/apple-touch-icon-76x76.png b/apple-touch-icon-76x76.png index ee387bf5f364e251cd1882e3c52fc6e7def54d3a..bce8812271da644ce4ccc3ecf112522e91909686 100644 GIT binary patch delta 97 zcmcZ_ay4W^B_|Ujua>$Q+uNv(O){DuM!E)uA%@0QCPr4qmf8jeRt5&F@_b7f7#LJb kTq844X|h^aoQ$1`I%-u02N*uYXATM delta 97 zcmcZ_ay4W^B`2f2s1E;zU0#zmHpysu80s1uh8P)InV47^nra&uSQ!{3md1rKFfgc= kxJHzuB$lLF<>sekrd2W+85o)98eq{Npz-Y0p-U0mnZ<4FCWD diff --git a/favicon-32x32.png b/favicon-32x32.png index dfd1172906697df36d0cbf032f044d0e777a45d5..4603ec17337e73593a8be5a7640142c7b4ee21eb 100644 GIT binary patch delta 97 zcmZn>Z4sSN$;rgXt7Q~@n{(sFCT~s;BV7Z-5JO`t6C*2A18oBXD+7bD7fv4;7#LJb kTq844X|iX-}=jUvM5(R04^pQ3;+NC delta 97 zcmZn>Z4sSN$;l`$sw3;nY<+8ElQ*Y_p{}uEh>@X{iHVhwrM7{Am4Sgql<`3Z1_sp< k*NBpo#FA92-29Zxv`Pje10xe%11uUM*}H2ei*n@y0LpI|Example#> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> -#> 1 ada 227µs 227µs 4405. 2.49KB 0 -#> 2 urltools 229µs 229µs 4373. 2.49KB 0 +#> 1 ada 2.43ms 2.43ms 411. 2.49KB 0 +#> 2 urltools 526.26µs 526.26µs 1900. 2.49KB 0

For further benchmark results, see benchmark.md in data_raw.

There are four more groups of functions available to work with url parsing:

diff --git a/pkgdown.yml b/pkgdown.yml index 86a93b1..a0010a2 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -3,7 +3,7 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: adaR: adaR.html -last_built: 2024-01-31T21:44Z +last_built: 2024-02-01T13:42Z urls: reference: https://schochastics.github.io/adaR/reference article: https://schochastics.github.io/adaR/articles diff --git a/search.json b/search.json index 81d492b..f743425 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present . ada_set_*() can used set specific components URL. ada_clear_*() can used remove certain components.","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE ada_set_hostname(\"https://example.de/test\", \"example.com\") #> [1] \"https://example.com/test\" url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_clear_port(url) #> [1] \"https://user_1:password_1@example.org/api?q=1#frag\" ada_clear_hash(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1\" ada_clear_search(url) #> [1] \"https://user_1:password_1@example.org:8080/api#frag\""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List. Note list, include registry suffixes (e.g., com, co.uk), controlled domain name registry governed ICANN. include “private” suffixes (e.g., blogspot.com) allow people register subdomains. Hence, use term domain sense “top domain registry suffix”. See https://github.com/google/guava/wiki/InternetDomainNameExplained details. wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author. Yagiz Nizipli. Contributor, copyright holder. author ada-url : Daniel Lemire. Contributor, copyright holder. author ada-url : ","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2024). adaR: Fast 'WHATWG' Compliant URL Parser. R package version 0.3.2, https://github.com/gesistsa/adaR, https://gesistsa.github.io/adaR/.","code":"@Manual{, title = {adaR: A Fast 'WHATWG' Compliant URL Parser}, author = {David Schoch and Chung-hong Chan}, year = {2024}, note = {R package version 0.3.2, https://github.com/gesistsa/adaR}, url = {https://gesistsa.github.io/adaR/}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A Fast WHATWG Compliant URL Parser","title":"A Fast WHATWG Compliant URL Parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup) general information URL parsing can found introductory vignette via vignette(\"adaR\"). adaR part series R packages analyse webtracking data: webtrackR: preprocess raw webtracking data domainator: classify domains adaR: parse urls","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A Fast WHATWG Compliant URL Parser","text":"can install development version adaR GitHub : version CRAN can installed ","code":"# install.packages(\"devtools\") devtools::install_github(\"gesistsa/adaR\") install.packages(\"adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A Fast WHATWG Compliant URL Parser","text":"basic example shows returned components URL. solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) carry R tricky. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances. benchmark results, see benchmark.md data_raw. four groups functions available work url parsing: ada_get_*() get specific component ada_has_*() check specific component present ada_set_*() set specific component URLS ada_clear_*() remove specific component URLS","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE), urltools = urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\"), iterations = 1, check = FALSE ) #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 227µs 227µs 4405. 2.49KB 0 #> 2 urltools 229µs 229µs 4373. 2.49KB 0"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A Fast WHATWG Compliant URL Parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A Fast WHATWG Compliant URL Parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":null,"dir":"Reference","previous_headings":"","what":"Clear a specific component of URL — ada_clear_port","title":"Clear a specific component of URL — ada_clear_port","text":"functions clears specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clear a specific component of URL — ada_clear_port","text":"","code":"ada_clear_port(url, decode = TRUE) ada_clear_hash(url, decode = TRUE) ada_clear_search(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clear a specific component of URL — ada_clear_port","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Clear a specific component of URL — ada_clear_port","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Clear a specific component of URL — ada_clear_port","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_clear_port(url) #> [1] \"https://user_1:password_1@example.org/api?q=1#frag\" ada_clear_hash(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1\" ada_clear_search(url) #> [1] \"https://user_1:password_1@example.org:8080/api#frag\""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE) ada_get_domain(url, decode = TRUE) ada_get_basename(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ada_get_domain(url) #> [1] \"example.org\" ada_get_basename(url) #> [1] \"https://example.org\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Set a specific component of URL — ada_set_href","title":"Set a specific component of URL — ada_set_href","text":"functions set specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set a specific component of URL — ada_set_href","text":"","code":"ada_set_href(url, input, decode = TRUE) ada_set_username(url, input, decode = TRUE) ada_set_password(url, input, decode = TRUE) ada_set_port(url, input, decode = TRUE) ada_set_host(url, input, decode = TRUE) ada_set_hostname(url, input, decode = TRUE) ada_set_pathname(url, input, decode = TRUE) ada_set_protocol(url, input, decode = TRUE) ada_set_search(url, input, decode = TRUE) ada_set_hash(url, input, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set a specific component of URL — ada_set_href","text":"url character. one URL parsed input character. containing new component URL. Vector length 1 length url. decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set a specific component of URL — ada_set_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set a specific component of URL — ada_set_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_set_href(url, \"https://google.de\") #> [1] \"https://google.de/\" ada_set_username(url, \"user_2\") #> [1] \"https://user_2:password_1@example.org:8080/api?q=1#frag\" ada_set_password(url, \"hunter2\") #> [1] \"https://user_1:hunter2@example.org:8080/api?q=1#frag\" ada_set_port(url, \"1234\") #> [1] \"https://user_1:password_1@example.org:1234/api?q=1#frag\" ada_set_hash(url, \"#section1\") #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#section1\" ada_set_host(url, \"example.de\") #> [1] \"https://user_1:password_1@example.de:8080/api?q=1#frag\" ada_set_hostname(url, \"example.de\") #> [1] \"https://user_1:password_1@example.de:8080/api?q=1#frag\" ada_set_pathname(url, \"path/\") #> [1] \"https://user_1:password_1@example.org:8080/path/?q=1#frag\" ada_set_search(url, \"q=2\") #> [1] \"https://user_1:password_1@example.org:8080/api?q=2#frag\" ada_set_protocol(url, \"ws:\") #> [1] \"ws://user_1:password_1@example.org:8080/api?q=1#frag\""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains or hostnames — public_suffix","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"Extract public suffix vector domains hostnames","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"","code":"public_suffix(domains)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"domains character. vector domains hostnames","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"public suffixes domains character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"","code":"public_suffix(\"http://example.com\") #> [1] \"com\" # doesn't work for general URLs public_suffix(\"http://example.com/path/to/file\") #> [1] NA # extracting hostname first does the trick public_suffix(ada_get_hostname(\"http://example.com/path/to/file\")) #> [1] \"com\""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function to percent-decode characters in URLs — url_decode2","text":"precent decoded URLs character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-032","dir":"Changelog","previous_headings":"","what":"adaR 0.3.2","title":"adaR 0.3.2","text":"fixed #66","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-031","dir":"Changelog","previous_headings":"","what":"adaR 0.3.1","title":"adaR 0.3.1","text":"CRAN release: 2023-11-16 bumped ada-url 2.7.3 transferred repository schochastics gesistsa","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-030","dir":"Changelog","previous_headings":"","what":"adaR 0.3.0","title":"adaR 0.3.0","text":"CRAN release: 2023-10-16 bump ada_url version 2.7.0 #58 export ada_clear_*() functions #57 export ada_set_*() functions #15 h/t @chainsawriot c++ template added ada_get_basename() #56","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-020","dir":"Changelog","previous_headings":"","what":"adaR 0.2.0","title":"adaR 0.2.0","text":"CRAN release: 2023-10-01 split C++ file isolate original ada-url code h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot) fixed #47 h/t Chung-hong Chan (@chainsawriot) added ada_get_domain() #43","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}] +[{"path":"https://schochastics.github.io/adaR/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 adaR authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"a-primer-on-urls","dir":"Articles","previous_headings":"","what":"A primer on URLs","title":"Introduction to adaR","text":"URL (Uniform Resource Locator) serves reference web resource specific components give information resource can fetched. table gives overview components valid URL. full URL might look something like : However, URLs can simple just scheme host (e.g., http://example.com). presence specific combination components can vary based exact nature purpose URL. terms necessarily unambiguous (sub) terms need explanation. protocol can also called scheme. hostname+port called host adaR. Additionally, query referred search fragment hash adaR. relevant subcomponents given following table. wait, . table gives definition several terms relevance dealing URLs adaR package.","code":"https://username:password@example.com:8080/directory/file.html?key1=value1&key2=value2#section2"},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"whatwg-compliant","dir":"Articles","previous_headings":"","what":"“WHATWG compliant”","title":"Introduction to adaR","text":"underlying C++ code adaR, ada-url “WHATWG copliant”. /WHATWG? Web Hypertext Application Technology Working Group (WHATWG) community people interested evolving web standards tests. founded individuals Apple, Mozilla Foundation, Opera Software 2004, W3C workshop. Apple, Mozilla Opera becoming increasingly concerned W3C’s direction XHTML, lack interest HTML, apparent disregard needs real-world web developers. , response, organisations set mission address concerns Web Hypertext Application Technology Working Group born. WHATWG working ? WHATWG’s focus standards implementable web browsers, associated tests. existing work can found . standard relevance package, url standard. “WHATWG compliant” means, ada-url follows url standard.","code":""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"parsing-urls","dir":"Articles","previous_headings":"","what":"Parsing urls","title":"Introduction to adaR","text":"function ada_url_parse() decomposes url components shown first table. function can deal punycode percent encoding generally handle types edge cases well. ada_url_parse() power horse adaR always returns components URL. Specific components can parsed ada_get_*() set functions. ada_has_*() can used check certain components present . ada_set_*() can used set specific components URL. ada_clear_*() can used remove certain components.","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag corner_cases <- c( \"https://example.com:8080\", \"http://user:password@example.com\", \"http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080\", \"https://example.com/path/to/resource?query=value&another=thing#fragment\", \"http://sub.sub.example.com\", \"ftp://files.example.com:2121/download/file.txt\", \"http://example.com/path with spaces/and&special=characters?\", \"https://user:pa%40ssword@example.com/path\", \"http://example.com/..//a/b/../c/./d.html\", \"https://example.com:8080/over/under?query=param#and-a-fragment\", \"http://192.168.0.1/path/to/resource\", \"http://3com.com/path/to/resource\", \"http://example.com/%7Eusername/\", \"https://example.com/a?query=value&query=value2\", \"https://example.com/a/b/c/..\", \"ws://websocket.example.com:9000/chat\", \"https://example.com:65535/edge-case-port\", \"file:///home/user/file.txt\", \"http://example.com/a/b/c/%2F%2F\", \"http://example.com/a/../a/../a/../a/\", \"https://example.com/./././a/\", \"http://example.com:8080/a;b?c=d#e\", \"http://@example.com\", \"http://example.com/@test\", \"http://example.com/@@@/a/b\", \"https://example.com:0/\", \"http://example.com/%25path%20with%20encoded%20chars\", \"https://example.com/path?query=%26%3D%3F%23\", \"http://example.com:8080/?query=value#fragment#fragment2\", \"https://example.xn--80akhbyknj4f/path/to/resource\", \"https://example.co.uk/path/to/resource\", \"http://username:pass%23word@example.net\", \"ftp://downloads.example.edu:3030/files/archive.zip\", \"https://example.com:8080/this/is/a/deeply/nested/path/to/a/resource\", \"http://another-example.com/..//test/./demo.html\", \"https://sub2.sub1.example.org:5000/login?user=test#section2\", \"ws://chat.example.biz:5050/livechat\", \"http://192.168.1.100/a/b/c/d\", \"https://secure.example.shop/cart?item=123&quantity=5\", \"http://example.travel/%60%21%40%23%24%25%5E%26*()\", \"https://example.museum/path/to/artifact?search=ancient\", \"ftp://secure-files.example.co:4040/files/document.docx\", \"https://test.example.aero/booking?flight=abc123\", \"http://example.asia/%E2%82%AC%E2%82%AC/path\", \"http://subdomain.example.tel/contact?name=john\", \"ws://game-server.example.jobs:2020/match?id=xyz\", \"http://example.mobi/path/with/mobile/content\", \"https://example.name/family/tree?name=smith\", \"http://192.168.2.2/path?query1=value1&query2=value2\", \"http://example.pro/professional/services\", \"https://example.info/information/page\", \"http://example.int/internal/systems/login\", \"https://example.post/postal/services\", \"http://example.xxx/age/verification\", \"https://example.xxx/another/edge/case/path?with=query#and-fragment\" ) df <- ada_url_parse(corner_cases) df[, -1] #> protocol username password host #> 1 https: example.com:8080 #> 2 http: user password example.com #> 3 http: [2001:db8:85a3::8a2e:370:7334]:8080 #> 4 https: example.com #> 5 http: sub.sub.example.com #> 6 ftp: files.example.com:2121 #> 7 http: example.com #> 8 https: user pa@ssword example.com #> 9 http: example.com #> 10 https: example.com:8080 #> 11 http: 192.168.0.1 #> 12 http: 3com.com #> 13 http: example.com #> 14 https: example.com #> 15 https: example.com #> 16 ws: websocket.example.com:9000 #> 17 https: example.com:65535 #> 18 file: #> 19 http: example.com #> 20 http: example.com #> 21 https: example.com #> 22 http: example.com:8080 #> 23 http: example.com #> 24 http: example.com #> 25 http: example.com #> 26 https: example.com:0 #> 27 http: example.com #> 28 https: example.com #> 29 http: example.com:8080 #> 30 https: example.испытание #> 31 https: example.co.uk #> 32 http: username pass#word example.net #> 33 ftp: downloads.example.edu:3030 #> 34 https: example.com:8080 #> 35 http: another-example.com #> 36 https: sub2.sub1.example.org:5000 #> 37 ws: chat.example.biz:5050 #> 38 http: 192.168.1.100 #> 39 https: secure.example.shop #> 40 http: example.travel #> 41 https: example.museum #> 42 ftp: secure-files.example.co:4040 #> 43 https: test.example.aero #> 44 http: example.asia #> 45 http: subdomain.example.tel #> 46 ws: game-server.example.jobs:2020 #> 47 http: example.mobi #> 48 https: example.name #> 49 http: 192.168.2.2 #> 50 http: example.pro #> 51 https: example.info #> 52 http: example.int #> 53 https: example.post #> 54 http: example.xxx #> 55 https: example.xxx #> hostname port #> 1 example.com 8080 #> 2 example.com #> 3 [2001:db8:85a3::8a2e:370:7334] 8080 #> 4 example.com #> 5 sub.sub.example.com #> 6 files.example.com 2121 #> 7 example.com #> 8 example.com #> 9 example.com #> 10 example.com 8080 #> 11 192.168.0.1 #> 12 3com.com #> 13 example.com #> 14 example.com #> 15 example.com #> 16 websocket.example.com 9000 #> 17 example.com 65535 #> 18 #> 19 example.com #> 20 example.com #> 21 example.com #> 22 example.com 8080 #> 23 example.com #> 24 example.com #> 25 example.com #> 26 example.com 0 #> 27 example.com #> 28 example.com #> 29 example.com 8080 #> 30 example.испытание #> 31 example.co.uk #> 32 example.net #> 33 downloads.example.edu 3030 #> 34 example.com 8080 #> 35 another-example.com #> 36 sub2.sub1.example.org 5000 #> 37 chat.example.biz 5050 #> 38 192.168.1.100 #> 39 secure.example.shop #> 40 example.travel #> 41 example.museum #> 42 secure-files.example.co 4040 #> 43 test.example.aero #> 44 example.asia #> 45 subdomain.example.tel #> 46 game-server.example.jobs 2020 #> 47 example.mobi #> 48 example.name #> 49 192.168.2.2 #> 50 example.pro #> 51 example.info #> 52 example.int #> 53 example.post #> 54 example.xxx #> 55 example.xxx #> pathname search #> 1 / #> 2 / #> 3 / #> 4 /path/to/resource ?query=value&another=thing #> 5 / #> 6 /download/file.txt #> 7 /path with spaces/and&special=characters #> 8 /path #> 9 //a/c/d.html #> 10 /over/under ?query=param #> 11 /path/to/resource #> 12 /path/to/resource #> 13 /~username/ #> 14 /a ?query=value&query=value2 #> 15 /a/b/ #> 16 /chat #> 17 /edge-case-port #> 18 /home/user/file.txt #> 19 /a/b/c/// #> 20 /a/ #> 21 /a/ #> 22 /a;b ?c=d #> 23 / #> 24 /@test #> 25 /@@@/a/b #> 26 / #> 27 /%path with encoded chars #> 28 /path ?query=&=?# #> 29 / ?query=value #> 30 /path/to/resource #> 31 /path/to/resource #> 32 / #> 33 /files/archive.zip #> 34 /this/is/a/deeply/nested/path/to/a/resource #> 35 //test/demo.html #> 36 /login ?user=test #> 37 /livechat #> 38 /a/b/c/d #> 39 /cart ?item=123&quantity=5 #> 40 /`!@#$%^&*() #> 41 /path/to/artifact ?search=ancient #> 42 /files/document.docx #> 43 /booking ?flight=abc123 #> 44 /€€/path #> 45 /contact ?name=john #> 46 /match ?id=xyz #> 47 /path/with/mobile/content #> 48 /family/tree ?name=smith #> 49 /path ?query1=value1&query2=value2 #> 50 /professional/services #> 51 /information/page #> 52 /internal/systems/login #> 53 /postal/services #> 54 /age/verification #> 55 /another/edge/case/path ?with=query #> hash #> 1 #> 2 #> 3 #> 4 #fragment #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 #and-a-fragment #> 11 #> 12 #> 13 #> 14 #> 15 #> 16 #> 17 #> 18 #> 19 #> 20 #> 21 #> 22 #e #> 23 #> 24 #> 25 #> 26 #> 27 #> 28 #> 29 #fragment#fragment2 #> 30 #> 31 #> 32 #> 33 #> 34 #> 35 #> 36 #section2 #> 37 #> 38 #> 39 #> 40 #> 41 #> 42 #> 43 #> 44 #> 45 #> 46 #> 47 #> 48 #> 49 #> 50 #> 51 #> 52 #> 53 #> 54 #> 55 #and-fragment ada_get_hostname(corner_cases) #> [1] \"example.com\" \"example.com\" #> [3] \"[2001:db8:85a3::8a2e:370:7334]\" \"example.com\" #> [5] \"sub.sub.example.com\" \"files.example.com\" #> [7] \"example.com\" \"example.com\" #> [9] \"example.com\" \"example.com\" #> [11] \"192.168.0.1\" \"3com.com\" #> [13] \"example.com\" \"example.com\" #> [15] \"example.com\" \"websocket.example.com\" #> [17] \"example.com\" \"\" #> [19] \"example.com\" \"example.com\" #> [21] \"example.com\" \"example.com\" #> [23] \"example.com\" \"example.com\" #> [25] \"example.com\" \"example.com\" #> [27] \"example.com\" \"example.com\" #> [29] \"example.com\" \"example.испытание\" #> [31] \"example.co.uk\" \"example.net\" #> [33] \"downloads.example.edu\" \"example.com\" #> [35] \"another-example.com\" \"sub2.sub1.example.org\" #> [37] \"chat.example.biz\" \"192.168.1.100\" #> [39] \"secure.example.shop\" \"example.travel\" #> [41] \"example.museum\" \"secure-files.example.co\" #> [43] \"test.example.aero\" \"example.asia\" #> [45] \"subdomain.example.tel\" \"game-server.example.jobs\" #> [47] \"example.mobi\" \"example.name\" #> [49] \"192.168.2.2\" \"example.pro\" #> [51] \"example.info\" \"example.int\" #> [53] \"example.post\" \"example.xxx\" #> [55] \"example.xxx\" ada_has_search(corner_cases) #> [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE #> [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE #> [25] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #> [37] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE #> [49] TRUE FALSE FALSE FALSE FALSE FALSE TRUE ada_set_hostname(\"https://example.de/test\", \"example.com\") #> [1] \"https://example.com/test\" url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_clear_port(url) #> [1] \"https://user_1:password_1@example.org/api?q=1#frag\" ada_clear_hash(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1\" ada_clear_search(url) #> [1] \"https://user_1:password_1@example.org:8080/api#frag\""},{"path":"https://schochastics.github.io/adaR/articles/adaR.html","id":"public-suffic-extraction","dir":"Articles","previous_headings":"","what":"Public suffic extraction","title":"Introduction to adaR","text":"package also implements public suffix extractor public_suffix(), based lookup Public Suffix List. Note list, include registry suffixes (e.g., com, co.uk), controlled domain name registry governed ICANN. include “private” suffixes (e.g., blogspot.com) allow people register subdomains. Hence, use term domain sense “top domain registry suffix”. See https://github.com/google/guava/wiki/InternetDomainNameExplained details. wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"David Schoch. Author, maintainer. Chung-hong Chan. Author. Yagiz Nizipli. Contributor, copyright holder. author ada-url : Daniel Lemire. Contributor, copyright holder. author ada-url : ","code":""},{"path":"https://schochastics.github.io/adaR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Schoch D, Chan C (2024). adaR: Fast 'WHATWG' Compliant URL Parser. R package version 0.3.2, https://github.com/gesistsa/adaR, https://gesistsa.github.io/adaR/.","code":"@Manual{, title = {adaR: A Fast 'WHATWG' Compliant URL Parser}, author = {David Schoch and Chung-hong Chan}, year = {2024}, note = {R package version 0.3.2, https://github.com/gesistsa/adaR}, url = {https://gesistsa.github.io/adaR/}, }"},{"path":"https://schochastics.github.io/adaR/index.html","id":"adar-","dir":"","previous_headings":"","what":"A Fast WHATWG Compliant URL Parser","title":"A Fast WHATWG Compliant URL Parser","text":"adaR wrapper ada-url, WHATWG-compliant fast URL parser written modern C++ . implements several auxilliary functions work urls: public suffix extraction (top level domain excluding private domains) like psl fast c++ implementation utils::URLdecode (~40x speedup) general information URL parsing can found introductory vignette via vignette(\"adaR\"). adaR part series R packages analyse webtracking data: webtrackR: preprocess raw webtracking data domainator: classify domains adaR: parse urls","code":""},{"path":"https://schochastics.github.io/adaR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"A Fast WHATWG Compliant URL Parser","text":"can install development version adaR GitHub : version CRAN can installed ","code":"# install.packages(\"devtools\") devtools::install_github(\"gesistsa/adaR\") install.packages(\"adaR\")"},{"path":"https://schochastics.github.io/adaR/index.html","id":"example","dir":"","previous_headings":"","what":"Example","title":"A Fast WHATWG Compliant URL Parser","text":"basic example shows returned components URL. solves problems urltools complex urls. “raw” url parse using ada extremely fast (see ada-url.com) carry R tricky. performance still compatible urltools::url_parse noted advantage accuracy practical circumstances. benchmark results, see benchmark.md data_raw. four groups functions available work url parsing: ada_get_*() get specific component ada_has_*() check specific component present ada_set_*() set specific component URLS ada_clear_*() remove specific component URLS","code":"library(adaR) ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag /* * https://user:pass@example.com:1234/foo/bar?baz#quux * | | | | ^^^^| | | * | | | | | | | `----- hash_start * | | | | | | `--------- search_start * | | | | | `----------------- pathname_start * | | | | `--------------------- port * | | | `----------------------- host_end * | | `---------------------------------- host_start * | `--------------------------------------- username_end * `--------------------------------------------- protocol_end */ urltools::url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14. 7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> scheme domain port #> 1 https 40.7519848,-74.0015045,14.\\n 7z #> path #> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> parameter fragment #> 1 ada_url_parse(\"https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519\") #> href #> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> protocol username password host hostname port #> 1 https: www.google.com www.google.com #> pathname #> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519 #> search hash #> 1 bench::mark( ada = ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\", decode = FALSE), urltools = urltools::url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\"), iterations = 1, check = FALSE ) #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 ada 2.43ms 2.43ms 411. 2.49KB 0 #> 2 urltools 526.26µs 526.26µs 1900. 2.49KB 0"},{"path":"https://schochastics.github.io/adaR/index.html","id":"public-suffix-extraction","dir":"","previous_headings":"","what":"Public Suffix extraction","title":"A Fast WHATWG Compliant URL Parser","text":"public_suffix() extracts top level domain public suffix list, excluding private domains. wondering last url. list also contains wildcard suffixes *.kawasaki.jp need matched.","code":"urls <- c( \"https://subsub.sub.domain.co.uk\", \"https://domain.api.gov.uk\", \"https://thisisnotpart.butthisispartoftheps.kawasaki.jp\" ) public_suffix(urls) #> [1] \"co.uk\" \"gov.uk\" #> [3] \"butthisispartoftheps.kawasaki.jp\""},{"path":"https://schochastics.github.io/adaR/index.html","id":"acknowledgement","dir":"","previous_headings":"","what":"Acknowledgement","title":"A Fast WHATWG Compliant URL Parser","text":"logo created portrait Ada Lovelace, early pioneer Computer Science.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":null,"dir":"Reference","previous_headings":"","what":"Clear a specific component of URL — ada_clear_port","title":"Clear a specific component of URL — ada_clear_port","text":"functions clears specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clear a specific component of URL — ada_clear_port","text":"","code":"ada_clear_port(url, decode = TRUE) ada_clear_hash(url, decode = TRUE) ada_clear_search(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clear a specific component of URL — ada_clear_port","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Clear a specific component of URL — ada_clear_port","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_clear_port.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Clear a specific component of URL — ada_clear_port","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_clear_port(url) #> [1] \"https://user_1:password_1@example.org/api?q=1#frag\" ada_clear_hash(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1\" ada_clear_search(url) #> [1] \"https://user_1:password_1@example.org:8080/api#frag\""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a specific component of URL — ada_get_href","title":"Get a specific component of URL — ada_get_href","text":"functions get specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a specific component of URL — ada_get_href","text":"","code":"ada_get_href(url, decode = TRUE) ada_get_username(url, decode = TRUE) ada_get_password(url, decode = TRUE) ada_get_port(url, decode = TRUE) ada_get_hash(url, decode = TRUE) ada_get_host(url, decode = TRUE) ada_get_hostname(url, decode = TRUE) ada_get_pathname(url, decode = TRUE) ada_get_search(url, decode = TRUE) ada_get_protocol(url, decode = TRUE) ada_get_domain(url, decode = TRUE) ada_get_basename(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a specific component of URL — ada_get_href","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get a specific component of URL — ada_get_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_get_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a specific component of URL — ada_get_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_get_href(url) #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#frag\" ada_get_username(url) #> [1] \"user_1\" ada_get_password(url) #> [1] \"password_1\" ada_get_port(url) #> [1] \"8080\" ada_get_hash(url) #> [1] \"#frag\" ada_get_host(url) #> [1] \"example.org:8080\" ada_get_hostname(url) #> [1] \"example.org\" ada_get_pathname(url) #> [1] \"/api\" ada_get_search(url) #> [1] \"?q=1\" ada_get_protocol(url) #> [1] \"https:\" ada_get_domain(url) #> [1] \"example.org\" ada_get_basename(url) #> [1] \"https://example.org\" ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_get_port(urls) #> [1] \"\" \"\" NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if URL has a certain component — ada_has_credentials","title":"Check if URL has a certain component — ada_has_credentials","text":"functions check URL certain component.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"ada_has_credentials(url) ada_has_empty_hostname(url) ada_has_hostname(url) ada_has_non_empty_username(url) ada_has_non_empty_password(url) ada_has_port(url) ada_has_hash(url) ada_has_search(url)"},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if URL has a certain component — ada_has_credentials","text":"url character. one URL parsed","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if URL has a certain component — ada_has_credentials","text":"logical, NA valid URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_has_credentials.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check if URL has a certain component — ada_has_credentials","text":"","code":"url <- c(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") ada_has_credentials(url) #> [1] TRUE ada_has_empty_hostname(url) #> [1] FALSE ada_has_hostname(url) #> [1] TRUE ada_has_non_empty_username(url) #> [1] TRUE ada_has_non_empty_password(url) #> [1] TRUE ada_has_port(url) #> [1] TRUE ada_has_hash(url) #> [1] TRUE ada_has_search(url) #> [1] TRUE ## these functions are vectorized urls <- c(\"http://www.google.com\", \"http://www.google.com:80\", \"noturl\") ada_has_port(urls) #> [1] FALSE FALSE NA"},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":null,"dir":"Reference","previous_headings":"","what":"Set a specific component of URL — ada_set_href","title":"Set a specific component of URL — ada_set_href","text":"functions set specific component URL.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set a specific component of URL — ada_set_href","text":"","code":"ada_set_href(url, input, decode = TRUE) ada_set_username(url, input, decode = TRUE) ada_set_password(url, input, decode = TRUE) ada_set_port(url, input, decode = TRUE) ada_set_host(url, input, decode = TRUE) ada_set_hostname(url, input, decode = TRUE) ada_set_pathname(url, input, decode = TRUE) ada_set_protocol(url, input, decode = TRUE) ada_set_search(url, input, decode = TRUE) ada_set_hash(url, input, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set a specific component of URL — ada_set_href","text":"url character. one URL parsed input character. containing new component URL. Vector length 1 length url. decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set a specific component of URL — ada_set_href","text":"character, NA valid URL","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_set_href.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set a specific component of URL — ada_set_href","text":"","code":"url <- \"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\" ada_set_href(url, \"https://google.de\") #> [1] \"https://google.de/\" ada_set_username(url, \"user_2\") #> [1] \"https://user_2:password_1@example.org:8080/api?q=1#frag\" ada_set_password(url, \"hunter2\") #> [1] \"https://user_1:hunter2@example.org:8080/api?q=1#frag\" ada_set_port(url, \"1234\") #> [1] \"https://user_1:password_1@example.org:1234/api?q=1#frag\" ada_set_hash(url, \"#section1\") #> [1] \"https://user_1:password_1@example.org:8080/api?q=1#section1\" ada_set_host(url, \"example.de\") #> [1] \"https://user_1:password_1@example.de:8080/api?q=1#frag\" ada_set_hostname(url, \"example.de\") #> [1] \"https://user_1:password_1@example.de:8080/api?q=1#frag\" ada_set_pathname(url, \"path/\") #> [1] \"https://user_1:password_1@example.org:8080/path/?q=1#frag\" ada_set_search(url, \"q=2\") #> [1] \"https://user_1:password_1@example.org:8080/api?q=2#frag\" ada_set_protocol(url, \"ws:\") #> [1] \"ws://user_1:password_1@example.org:8080/api?q=1#frag\""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":null,"dir":"Reference","previous_headings":"","what":"Use ada-url to parse a url — ada_url_parse","title":"Use ada-url to parse a url — ada_url_parse","text":"Use ada-url parse url","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(url, decode = TRUE)"},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use ada-url to parse a url — ada_url_parse","text":"url character. one URL parsed decode logical. Whether decode output (see utils::URLdecode()), default TRUE","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use ada-url to parse a url — ada_url_parse","text":"data frame url components: href, protocol, username, password, host, hostname, port, pathname, search, hash","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Use ada-url to parse a url — ada_url_parse","text":"details returned components refer introductory vignette.","code":""},{"path":"https://schochastics.github.io/adaR/reference/ada_url_parse.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Use ada-url to parse a url — ada_url_parse","text":"","code":"ada_url_parse(\"https://user_1:password_1@example.org:8080/dir/../api?q=1#frag\") #> href protocol username #> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1 #> password host hostname port pathname search hash #> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract the public suffix from a vector of domains or hostnames — public_suffix","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"Extract public suffix vector domains hostnames","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"","code":"public_suffix(domains)"},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"domains character. vector domains hostnames","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"public suffixes domains character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/public_suffix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract the public suffix from a vector of domains or hostnames — public_suffix","text":"","code":"public_suffix(\"http://example.com\") #> [1] \"com\" # doesn't work for general URLs public_suffix(\"http://example.com/path/to/file\") #> [1] NA # extracting hostname first does the trick public_suffix(ada_get_hostname(\"http://example.com/path/to/file\")) #> [1] \"com\""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to percent-decode characters in URLs — url_decode2","title":"Function to percent-decode characters in URLs — url_decode2","text":"Similar utils::URLdecode","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(url)"},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to percent-decode characters in URLs — url_decode2","text":"url character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function to percent-decode characters in URLs — url_decode2","text":"precent decoded URLs character vector","code":""},{"path":"https://schochastics.github.io/adaR/reference/url_decode2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Function to percent-decode characters in URLs — url_decode2","text":"","code":"url_decode2(\"Hello%20World\") #> [1] \"Hello World\""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-032","dir":"Changelog","previous_headings":"","what":"adaR 0.3.2","title":"adaR 0.3.2","text":"fixed #66","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-031","dir":"Changelog","previous_headings":"","what":"adaR 0.3.1","title":"adaR 0.3.1","text":"CRAN release: 2023-11-16 bumped ada-url 2.7.3 transferred repository schochastics gesistsa","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-030","dir":"Changelog","previous_headings":"","what":"adaR 0.3.0","title":"adaR 0.3.0","text":"CRAN release: 2023-10-16 bump ada_url version 2.7.0 #58 export ada_clear_*() functions #57 export ada_set_*() functions #15 h/t @chainsawriot c++ template added ada_get_basename() #56","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-020","dir":"Changelog","previous_headings":"","what":"adaR 0.2.0","title":"adaR 0.2.0","text":"CRAN release: 2023-10-01 split C++ file isolate original ada-url code h/t Chung-hong Chan (@chainsawriot) add support public suffix extraction #14 add support punycode #18 added url_decode2 fast alternative utils::URLdecode improved vectorization ada_get_* ada_has_* #26 #30 h/t Chung-hong Chan (@chainsawriot) fixed #47 h/t Chung-hong Chan (@chainsawriot) added ada_get_domain() #43","code":""},{"path":"https://schochastics.github.io/adaR/news/index.html","id":"adar-010","dir":"Changelog","previous_headings":"","what":"adaR 0.1.0","title":"adaR 0.1.0","text":"added ada_url_parser added ada_get_* error handling wrong urls #2 fixed #5 h/t Chung-hong Chan (@chainsawriot) add checks #7 vectorized functions #4 tests h/t Chung-hong Chan (@chainsawriot)","code":""}]