-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3e62518
commit a9d1eb9
Showing
10 changed files
with
465 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,5 @@ | |
README.html | ||
inst/doc | ||
docs | ||
|
||
/.quarto/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
# -------------------------------------------- | ||
# CITATION file created with {cffr} R package | ||
# See also: https://docs.ropensci.org/cffr/ | ||
# -------------------------------------------- | ||
|
||
cff-version: 1.2.0 | ||
message: 'To cite package "adaR" in publications use:' | ||
type: software | ||
license: MIT | ||
title: 'adaR: A Fast ''WHATWG'' Compliant URL Parser' | ||
version: 0.3.2 | ||
abstract: A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written | ||
in modern 'C++'. Also contains auxiliary functions such as a public suffix extractor. | ||
authors: | ||
- family-names: Schoch | ||
given-names: David | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-2952-4812 | ||
- family-names: Chan | ||
given-names: Chung-hong | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0002-6232-7530 | ||
repository: https://CRAN.R-project.org/package=adaR | ||
repository-code: https://github.com/gesistsa/adaR | ||
url: https://gesistsa.github.io/adaR/ | ||
contact: | ||
- family-names: Schoch | ||
given-names: David | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-2952-4812 | ||
keywords: | ||
- r | ||
- rstats | ||
- rstats-package | ||
- url-parser | ||
references: | ||
- type: software | ||
title: Rcpp | ||
abstract: 'Rcpp: Seamless R and C++ Integration' | ||
notes: LinkingTo | ||
url: https://www.rcpp.org | ||
repository: https://CRAN.R-project.org/package=Rcpp | ||
authors: | ||
- family-names: Eddelbuettel | ||
given-names: Dirk | ||
- family-names: Francois | ||
given-names: Romain | ||
- family-names: Allaire | ||
given-names: JJ | ||
- family-names: Ushey | ||
given-names: Kevin | ||
- family-names: Kou | ||
given-names: Qiang | ||
- family-names: Russell | ||
given-names: Nathan | ||
- family-names: Ucar | ||
given-names: Inaki | ||
- family-names: Bates | ||
given-names: Douglas | ||
- family-names: Chambers | ||
given-names: John | ||
year: '2024' | ||
- type: software | ||
title: triebeard | ||
abstract: 'triebeard: ''Radix'' Trees in ''Rcpp''' | ||
notes: Imports | ||
url: https://github.com/Ironholds/triebeard/ | ||
repository: https://CRAN.R-project.org/package=triebeard | ||
authors: | ||
- family-names: Keyes | ||
given-names: Os | ||
- family-names: Schmidt | ||
given-names: Drew | ||
- family-names: Takano | ||
given-names: Yuuki | ||
year: '2024' | ||
- type: software | ||
title: knitr | ||
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R' | ||
notes: Suggests | ||
url: https://yihui.org/knitr/ | ||
repository: https://CRAN.R-project.org/package=knitr | ||
authors: | ||
- family-names: Xie | ||
given-names: Yihui | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-0645-5666 | ||
year: '2024' | ||
- type: software | ||
title: rmarkdown | ||
abstract: 'rmarkdown: Dynamic Documents for R' | ||
notes: Suggests | ||
url: https://pkgs.rstudio.com/rmarkdown/ | ||
repository: https://CRAN.R-project.org/package=rmarkdown | ||
authors: | ||
- family-names: Allaire | ||
given-names: JJ | ||
email: [email protected] | ||
- family-names: Xie | ||
given-names: Yihui | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-0645-5666 | ||
- family-names: Dervieux | ||
given-names: Christophe | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-4474-2498 | ||
- family-names: McPherson | ||
given-names: Jonathan | ||
email: [email protected] | ||
- family-names: Luraschi | ||
given-names: Javier | ||
- family-names: Ushey | ||
given-names: Kevin | ||
email: [email protected] | ||
- family-names: Atkins | ||
given-names: Aron | ||
email: [email protected] | ||
- family-names: Wickham | ||
given-names: Hadley | ||
email: [email protected] | ||
- family-names: Cheng | ||
given-names: Joe | ||
email: [email protected] | ||
- family-names: Chang | ||
given-names: Winston | ||
email: [email protected] | ||
- family-names: Iannone | ||
given-names: Richard | ||
email: [email protected] | ||
orcid: https://orcid.org/0000-0003-3925-190X | ||
year: '2024' | ||
- type: software | ||
title: testthat | ||
abstract: 'testthat: Unit Testing for R' | ||
notes: Suggests | ||
url: https://testthat.r-lib.org | ||
repository: https://CRAN.R-project.org/package=testthat | ||
authors: | ||
- family-names: Wickham | ||
given-names: Hadley | ||
email: [email protected] | ||
year: '2024' | ||
version: '>= 3.0.0' | ||
- type: software | ||
title: 'R: A Language and Environment for Statistical Computing' | ||
notes: Depends | ||
url: https://www.R-project.org/ | ||
authors: | ||
- name: R Core Team | ||
institution: | ||
name: R Foundation for Statistical Computing | ||
address: Vienna, Austria | ||
year: '2024' | ||
version: '>= 4.2' | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
project: | ||
title: adaR | ||
type: default | ||
render: | ||
- methodshub.qmd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
install.packages("adaR") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# adaR - A Fast ‘WHATWG’ Compliant URL Parser | ||
|
||
|
||
## Description | ||
|
||
<!-- - Provide a brief and clear description of the method, its purpose, and what it aims to achieve. Add a link to a related paper from social science domain and show how your method can be applied to solve that research question. --> | ||
|
||
A wrapper for ‘ada-url’, a ‘WHATWG’ compliant and fast URL parser | ||
written in modern ‘C++’. Also contains auxiliary functions such as a | ||
public suffix extractor. | ||
|
||
## Keywords | ||
|
||
<!-- EDITME --> | ||
|
||
- URL Parsing | ||
- Webtracking Data | ||
- Webscraping | ||
|
||
## Science Usecase(s) | ||
|
||
<!-- - Include usecases from social sciences that would make this method applicable in a certain scenario. --> | ||
<!-- The use cases or research questions mentioned should arise from the latest social science literature cited in the description. --> | ||
|
||
URL parsing is an important process in the analysis of webtracking data, | ||
e.g. [GESIS Web | ||
Tracking](https://www.gesis.org/en/services/planning-studies-and-collecting-data/tools-for-the-collection-of-digital-behavioral-data/gesis-web-tracking). | ||
Although not using this package, the technique has been used in various | ||
social science publications, e.g. [de León et | ||
al. (2023)](https://doi.org/10.5117/CCR2023.2.4.DELE). | ||
|
||
The package was used in various webscraping projects for communication | ||
research, e.g. [paperboy](https://github.com/JBGruber/paperboy). | ||
|
||
## Repository structure | ||
|
||
This repository follows [the standard structure of an R | ||
package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure). | ||
|
||
## Environment Setup | ||
|
||
With R installed: | ||
|
||
``` r | ||
install.packages("adaR") | ||
``` | ||
|
||
<!-- ## Hardware Requirements (Optional) --> | ||
<!-- - The hardware requirements may be needed in specific cases when a method is known to require more memory/compute power. --> | ||
<!-- - The method need to be executed on a specific architecture (GPUs, Hadoop cluster etc.) --> | ||
|
||
## Input Data | ||
|
||
<!-- - The input data has to be a Digital Behavioral Data (DBD) Dataset --> | ||
<!-- - You can provide link to a public DBD dataset. GESIS DBD datasets (https://www.gesis.org/en/institute/digital-behavioral-data) --> | ||
|
||
The input data has to be a vector of URLs. | ||
|
||
## Sample Input and Output Data | ||
|
||
<!-- - Show how the input data looks like through few sample instances --> | ||
<!-- - Providing a sample output on the sample input to help cross check --> | ||
|
||
The input data looks like this: | ||
|
||
``` r | ||
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1") | ||
|
||
urls | ||
``` | ||
|
||
[1] "https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1" | ||
|
||
The output data is a data frame of parsed URLs. | ||
|
||
## How to Use | ||
|
||
<!-- - Providing HowTos on the method for different types of usages --> | ||
<!-- - Describe how the method should be used, including installation, configuration, and any specific instructions for users. --> | ||
|
||
Please refer to the [“Introduction to | ||
adaR”](https://gesistsa.github.io/adaR/articles/adaR.html) for a | ||
comprehensive introduction of the package. | ||
|
||
The main function of this package is `ada_url_parse()` and it decomposes | ||
a url into its components. | ||
|
||
``` r | ||
library(adaR) | ||
|
||
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1", | ||
"https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html", | ||
"https://www.sueddeutsche.de/thema/Fu%C3%9Fball-EM") | ||
|
||
ada_url_parse(urls) | ||
``` | ||
|
||
href | ||
1 https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1 | ||
2 https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html | ||
3 https://www.sueddeutsche.de/thema/Fußball-EM | ||
protocol username password host hostname port | ||
1 https: www.google.de www.google.de | ||
2 https: www.nytimes.com www.nytimes.com | ||
3 https: www.sueddeutsche.de www.sueddeutsche.de | ||
pathname | ||
1 /search | ||
2 /2024/06/19/world/africa/sudan-darfur-takeaways.html | ||
3 /thema/Fußball-EM | ||
search hash | ||
1 ?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1 | ||
2 | ||
3 | ||
|
||
## Contact Details | ||
|
||
Maintainer: David Schoch <[email protected]> | ||
|
||
Issue Tracker: <https://github.com/gesistsa/adaR/issues> | ||
|
||
<!-- ## Publication --> | ||
<!-- - Include information on publications or articles related to the method, if applicable. --> | ||
<!-- ## Acknowledgements --> | ||
<!-- - Acknowledgements if any --> | ||
<!-- ## Disclaimer --> | ||
<!-- - Add any disclaimers, legal notices, or usage restrictions for the method, if necessary. --> |
Oops, something went wrong.