Skip to content

Commit 40896e7

Browse files
authored
Update methodshub.qmd (#79)
1 parent 0bba969 commit 40896e7

File tree

1 file changed

+102
-46
lines changed

1 file changed

+102
-46
lines changed

methodshub.qmd

Lines changed: 102 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,40 @@
11
---
2-
title: adaR - A Fast 'WHATWG' Compliant URL Parser
32
format:
43
html:
54
embed-resources: true
65
gfm: default
76
---
87

9-
## Description
8+
# adaR - A Fast 'WHATWG' Compliant URL Parser
9+
<!--
10+
General specifications:
11+
- This specification of the Methods Hub friendly README often uses the word 'should' to indicate the usual case. If you feel you need to do it differently, add a comment to argue for your case when you submit your method.
12+
- A Methods Hub friendly README should contain all sections below that are not marked as optional, and can contain more sections.
13+
- A Methods Hub friendly README should contain as few technical terms as possible and explain (or link to an explanation of) all used technical terms.
14+
- A Methods Hub friendly README should link to all code files that it mentions using the [text](URL relative to this file) format. The relative URL (i.e., no "https://github.com") is neccessary for proper versioning in Methods Hub.
15+
- A Methods Hub friendly README should contain an explanation (in the text) and an alternative for each image it contains (e.g., data models, pipeline, schema structure). Format: ![alternative text that describes what is visible in the image](URL relative to this file).
16+
- A Methods Hub friendly README should link to authoritative sources rather than containing a copy of the information (e.g., documentation).
17+
- A Methods Hub friendly README should use a uniform citation style for all references, for example APA7 https://apastyle.apa.org/style-grammar-guidelines/references/examples
18+
19+
Title:
20+
1. The title must be the README's only first-level heading (line starting with a single '#').
21+
2. The title should make the method's purpose clear.
22+
3. The title (line 1 of this file) must be changed by you, but all other headings should be kept as they are.
23+
4. The title must be appropriate (not harmful, derogatory, etc.).
24+
25+
Section templates:
26+
The README template comes with text templates for each section (after each comment) that can be used, customized or removed as desired.
27+
-->
1028

11-
<!-- - Provide a brief and clear description of the method, its purpose, and what it aims to achieve. Add a link to a related paper from social science domain and show how your method can be applied to solve that research question. -->
29+
## Description
30+
<!--
31+
1. Provide a brief and exact description of the method clearly mentioning its purpose i.e., what the method does or aims to achieve in abstract terms (avoiding technical details).
32+
2. The focus should be on explaining the method in a way that helps users with different levels of expertise understand what it does, without going into technical details. It should clearly describe what inputs are needed and what outputs can be expected.
33+
3. Briefly explain the input and output of the method and its note worthy features.
34+
4. Provide link(s) to related papers from the social science domain using the method or similar methods for solving social science research questions.
35+
5. In a separate paragraph, highlight the reproducibility aspect of the method providing details or references to the resources used by the method, the data used in building the pre-trained modules etc.
36+
6. It should also discuss the decisions and parameters controlling the behavior of the method.
37+
-->
1238

1339
A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written in modern 'C++'. Also contains auxiliary functions such as a public suffix extractor.
1440

@@ -20,59 +46,69 @@ A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written in mod
2046
* Webtracking Data
2147
* Webscraping
2248

23-
## Science Usecase(s)
24-
25-
<!-- - Include usecases from social sciences that would make this method applicable in a certain scenario. -->
26-
<!-- The use cases or research questions mentioned should arise from the latest social science literature cited in the description. -->
49+
## Use Cases
50+
<!--
51+
1. The use cases section should contain a list of use cases relevant to the social sciences.
52+
2. Each use case should start with a description of a task and then detail how one can use the method to assist in the task.
53+
3. Each use case may list publications in which the use case occurs (e.g., in APA7 style, https://apastyle.apa.org/style-grammar-guidelines/references/examples).
54+
-->
2755

2856
URL parsing is an important process in the analysis of webtracking data, e.g. [GESIS Web Tracking](https://www.gesis.org/en/services/planning-studies-and-collecting-data/tools-for-the-collection-of-digital-behavioral-data/gesis-web-tracking). Although not using this package, the technique has been used in various social science publications, e.g. [de León et al. (2023)](https://doi.org/10.5117/CCR2023.2.4.DELE).
2957

3058
The package was used in various webscraping projects for communication research, e.g. [paperboy](https://github.com/JBGruber/paperboy).
3159

32-
## Repository structure
33-
34-
This repository follows [the standard structure of an R package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure).
60+
## Input Data
61+
<!--
62+
1. The input data section should illustrate the input data format by showing a (possibly abbreviated) example item and explaining (or linking to an explanation of) the data fields.
63+
2. The input data section should specify which parts of the input data are optional and what effect it has to not provide these.
64+
3. The input data section should link to a small example input file in the same repository that can be used to test the method (this test should be described in the section "How to Use").
65+
-->
3566

36-
## Environment Setup
67+
The input data has to be a vector of URLs and looks like this:
3768

38-
With R installed:
69+
```{r}
70+
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1")
3971
40-
```r
41-
install.packages("adaR")
72+
urls
4273
```
4374

44-
## Hardware Requirements
45-
46-
adaR runs on any machine that can run R.
47-
48-
## Input Data
49-
50-
<!-- - The input data has to be a Digital Behavioral Data (DBD) Dataset -->
51-
<!-- - You can provide link to a public DBD dataset. GESIS DBD datasets (https://www.gesis.org/en/institute/digital-behavioral-data) -->
52-
53-
The input data has to be a vector of URLs.
75+
## Output Data
76+
<!--
77+
1. The output data section should illustrate the output data format by showing a (possibly abbreviated) example item and explaining (or linking to an explanation of) the data fields.
78+
2. The output data section should link to a small example output file in the same repository that can be re-created (as far as the method is non-random) from the input data (as described in the section "How to Use").
79+
-->
5480

81+
The output data is a data frame of parsed URLs.
5582

56-
## Sample Input and Output Data
83+
## Hardware Requirements
84+
<!--
85+
1. The hardware requirements section should list all requirements (storage, memory, compute, GPUs, cluster software, ...) that exceed the capabilities of a cheap virtual machine provided by cloud computing company (2 x86 CPU core, 4 GB RAM, 40GB HDD).
86+
2. If the method requires a GPU, the hardware requirements section must list the minimal GPU requirements (especially VRAM).
87+
-->
5788

58-
<!-- - Show how the input data looks like through few sample instances -->
59-
<!-- - Providing a sample output on the sample input to help cross check -->
89+
adaR runs on any hardware that can run R.
6090

61-
The input data looks like this:
91+
## Environment Setup
92+
<!--
93+
1. The environment setup section should list all requirements and provide all further steps to prepare an environment for running the method (installing requirements, downloading files, creating directoriees, etc.).
94+
2. The environment setup section should recommend to use a virtual environment or similar if the programming language supports one.
95+
-->
6296

63-
```{r}
64-
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1")
97+
With R installed:
6598

66-
urls
99+
```r
100+
install.packages("adaR")
67101
```
68102

69-
The output data is a data frame of parsed URLs.
70-
71-
## How to Use
103+
## Repository structure
72104

73-
<!-- - Providing HowTos on the method for different types of usages -->
74-
<!-- - Describe how the method should be used, including installation, configuration, and any specific instructions for users. -->
105+
This repository follows [the standard structure of an R package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure).
75106

107+
## How to Use
108+
<!--
109+
1. The how to use section should provide the list of steps that are necessary to produce the example output file (see section Output Data) after having set up the environment (see section Environment Setup).
110+
2. The how to use section should explain how to customize the steps to one's own needs, usually through configuration files or command line parameters, or refer to the appropriate open documentation.
111+
-->
76112

77113
Please refer to the ["Introduction to adaR"](https://gesistsa.github.io/adaR/articles/adaR.html) for a comprehensive introduction of the package.
78114

@@ -89,20 +125,40 @@ ada_url_parse(urls)
89125
```
90126

91127
## Technical Details
128+
<!--
129+
1. The technical details section should proview a process overview, linking to key source code files at every step of the process.
130+
2. In case a publication provides the details mentioned below, the technical details section should link to this publication using a sentence like "See the [publication](url-of-publication-best-using-doi) for ...". In this case, the mentioned technical details can be omitted from the section.
131+
3. The technical details section should list all information needed to reproduce the method, including employed other methods and selected parameters.
132+
4. The input data section should link to external data it uses, preferably using a DOI to a dataset page or to API documentation.
133+
5. The technical details section should mention how other methods and their parameters were selected and which alternatives were tried.
134+
6. The technical details section should for employed machine learning models mention on what kind of data they were trained.
135+
-->
136+
137+
See the official [CRAN page](https://doi.org/10.32614/CRAN.package.adaR) for further information about technical details.
138+
139+
<!--## References -->
140+
<!--
141+
1. The references section is optional, especially if they are cited in a publication that explains the technical details (see section Technical Details).
142+
2. The references section should provide references of publications related to this method (e.g., in APA7 style, https://apastyle.apa.org/style-grammar-guidelines/references/examples).
143+
-->
144+
145+
<!-- ## Acknowledgements -->
146+
<!--
147+
1. The acknowledgments section is optional.
148+
2. The acknowledgments section should list expressions of gratitude to people or organizations who contributed, supported or guided.
149+
-->
92150

93-
Check the official [CRAN page](https://doi.org/10.32614/CRAN.package.adaR) for further information.
151+
<!-- ## Disclaimer-->
152+
<!--
153+
1. The disclaimer section is optional.
154+
2. The disclaimer section should list disclaimers, legal notices, or usage restrictions for the method.
155+
-->
94156

95157
## Contact Details
158+
<!--
159+
1. The contact details section should specify whom to contact for questions or contributions and how (can be separate entitites; for example email addresses or links to the GitHub issue board).
160+
-->
96161

97162
Maintainer: David Schoch <[email protected]>
98163

99164
Issue Tracker: [https://github.com/gesistsa/adaR/issues](https://github.com/gesistsa/adaR/issues)
100-
101-
<!-- ## Publication -->
102-
<!-- - Include information on publications or articles related to the method, if applicable. -->
103-
104-
<!-- ## Acknowledgements -->
105-
<!-- - Acknowledgements if any -->
106-
107-
<!-- ## Disclaimer -->
108-
<!-- - Add any disclaimers, legal notices, or usage restrictions for the method, if necessary. -->

0 commit comments

Comments
 (0)