Skip to content

Commit 3dbf6fd

Browse files
committed
Update README
Update README to include RapidAPI change as well as correcting functions and output.
1 parent 529b02e commit 3dbf6fd

File tree

2 files changed

+151
-80
lines changed

2 files changed

+151
-80
lines changed

README.Rmd

Lines changed: 56 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "botscan"
3-
author: "Kurt Wirth"
3+
author: "Kurt Wirth and Ryan Moore"
44
date: "`r Sys.Date()`"
55
output: github_document
66
---
@@ -24,28 +24,39 @@ if (!requireNamespace("devtools", quietly = TRUE)) {
2424
devtools::install_github("kurtawirth/botscan")
2525
```
2626

27-
This package connects <code>botometer</code> to <code>rtweet</code>. As a
28-
result, each user must have previously acquired authentication from Twitter and
29-
instructions to do that [can be found here](http://rtweet.info/articles/auth.html).
27+
This package connects <code>botometer</code> to <code>rtweet</code>. On first
28+
load, rtweet will request authentication in your default browser. Accept the
29+
connection, return to RStudio, and botscan will continue automatically.
3030

31-
Users will also need to install the latest version of Python, as botscan
32-
accesses Python-based <code>botometer</code>, as well as acquiring a [Mashape
33-
key](https://market.mashape.com/OSoMe/botometer). Doing so requires a Mashape
34-
account. BotOMeter's default rate limit is 2,000 inquiries per day, so users
35-
are strongly encouraged to sign up for a (free) BotOMeter Pro key. Instructions
36-
to do that [can be found here](https://market.mashape.com/OSoMe/botometer-pro/pricing).
31+
Users will also need to install the latest version of Python and add it to the
32+
PATH upon installing when prompted, as botscan accesses Python-based
33+
<code>botometer</code>, as well as acquiring a
34+
[RapidAPI key](https://rapidapi.com/OSoMe/api/botometer-pro). Doing so requires
35+
a RapidAPI account. BotOMeter Pro is free and has a rate limit of 2,000
36+
inquiries per day. Alternatively, users can opt for the Ultra plan, which
37+
enables 17,280 inquires per day and costs $50/month. Plans can be chosen
38+
[here](https://rapidapi.com/OSoMe/api/botometer-pro/pricing).
3739

3840
## Usage
3941

40-
There are two functions currently live for botscan.
42+
There are three functions currently live for botscan.
4143

42-
To begin, the user must first enter the following code, inserting their keys
43-
where appropriate:
44+
To begin, the user must first enter the following code:
4445

45-
```{setup instructions, eval = FALSE}
46-
bom <- setup_botscan("YourMashapeKey",
47-
"YourTwitterConsumerKey",
48-
"YourTwitterConsumerSecret",
46+
```{install, eval = FALSE}
47+
install_botometer()
48+
```
49+
50+
If your Python has been added to your machine's PATH as mentioned above, this
51+
function will install BotOMeter via <code>pip</code>.
52+
53+
Next, a user must create a bom object in their environment with the following
54+
code, inserting their keys where appropriate:
55+
56+
```{setup, eval = FALSE}
57+
bom <- setup_botscan("YourRapiAPIKey",
58+
"YourTwitterConsumerAPIKey",
59+
"YourTwitterConsumerAPISecretKey",
4960
"YourTwitterAccessToken",
5061
"YourTwitterAccessTokenSecret")
5162
```
@@ -57,31 +68,36 @@ Next, the fun begins with <code>botscan</code>.
5768
Its first argument takes any Twitter query, complete with boolean operators if
5869
desired, surrounded by quotation marks.
5970

71+
The second argument allows the user to provide external data in the form of any
72+
Twitter object with a column named "screen_name". The user can simply answer
73+
this argument with the name of any Twitter object in their environment and
74+
botscan will skip data collection and use the user's data instead.
75+
6076
The next argument determines how long an open stream of tweets will be
6177
collected, with a default of 30 seconds. In order to gather a specific volume
6278
of tweets, it is suggested that the user run a small initial test to determine
6379
a rough rate of tweets for the given query. If the user prefers to use Twitter's
6480
Search API, the next argument allows the user to specify the number of tweets
6581
to extract.
6682

67-
The fourth argument determines whether retweets will be included if using the
68-
Search API and the fifth takes a number, less than one, that represents the
83+
The fourth argument takes a number, less than one, that represents the
6984
desired threshold at which an account should be considered a bot. The default
7085
is .430, a reliable threshold as described by BotOMeter's creator [here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
7186

72-
The sixth argument allows the user to toggle between user-level and
73-
conversation-level summaries. The default is set to conversation-level
74-
data, understood as the proportion of the queried conversation that is
75-
bot-related. If <code>user_level</code> is set to <code>TRUE</code>,
76-
<code>botscan</code> will return user-level data, understood to be the
77-
proportion of the queried conversation's authors that are estimated to be bots.
78-
79-
The seventh argument allows the user to toggle between Twitter's Search and
87+
The fifth argument allows the user to toggle between Twitter's Search and
8088
Streaming APIs. The default is set to using the Streaming API, as it is
8189
unfiltered by Twitter and thus produces more accurate data. Search API data is
8290
filtered to eliminate low quality content, thus negatively impacting
8391
identification of bot accounts.
8492

93+
The sixth argument allows the user to determine the volume of tweets desired
94+
when using the Search API. Note that this argument will be ignored when using
95+
the Streaming API.
96+
97+
The seventh argument determines whether retweets will be included if using the
98+
Search API. Likewise, this argument will be ignored when using
99+
the Streaming API.
100+
85101
The eighth argument allows the user to opt out of auto-parsing of data,
86102
primarily useful when dealing with large volumes of data. The ninth and final
87103
argument defaults to keeping the user informed about the progress of the tool
@@ -94,35 +110,32 @@ library(botscan)
94110
95111
## Enter query surrounded by quotation marks
96112
botscan("#rstats")
97-
#> [1] 0.1642276
98113
99-
## Result is percentage - in this case, 16.42276%.
114+
## Result is a list of three objects, described below
100115
101116
## If desired, choose the stream time and threshold
102117
botscan("#rstats", timeout = 60, threshold = .995)
103-
#> [1] 0.02398524
104118
105119
## Alternatively, choose to use Twitter's Search API and options associated with it.
106120
botscan("#rstats", n_tweets = 1500, retweets = TRUE, search = TRUE, threshold = .995)
107-
#> [1] 0.03270932
108-
109-
## Result is percentage - in this case, 2.398524%.
110121
111122
##If desired, scan only users rather than the conversation as a whole.
112123
botscan("#rstats", user_level = TRUE)
113-
#> [1] 0.1505155
114-
115-
## Result is percentage - in this case, 15.05155%.
116124
```
117125

126+
The output from botscan is a list of three objects. The first is a dataframe
127+
including all raw data from Twitter and BotOMeter. The second is a string
128+
with the percentage of users in the data set that are estimated to be bots as
129+
determined by the user's provided threshold. The third is the percentage of
130+
tweets that are estimated to be bot-authored as determined by the user's
131+
provided threshold.
132+
118133
This process takes some time, as botscan is currently built on a loop of
119-
BotOMeter. Efforts to mainstream this process are set as future goals. A
120-
standard pull of tweets via <code>botscan</code> processes approximately 11 to
134+
BotOMeter. A standard pull of tweets via botscan processes approximately 11 to
121135
12 accounts per minute in addition to the initial tweet streaming.
122136

123137
Twitter rate limits cap the number of Search results returned to 18,000 every
124-
15 minutes. Thus, excessive use of <code>botscan</code> in a short amount of
125-
time may result in a warning and inability to pull results. In this event,
126-
simply wait 15 minutes and try again. In an effort to avoid the Twitter rate
127-
limit cap, <code>botscan</code> defaults to returning 1000 results when
128-
<code>search = TRUE</code>.
138+
15 minutes. Thus, excessive use of botscan in a short amount of time may result
139+
in a warning and inability to pull results. In this event, simply wait 15
140+
minutes and try again. In an effort to avoid the Twitter rate limit cap,
141+
botscan defaults to returning 1000 results when search = TRUE.

README.md

Lines changed: 95 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
botscan
22
================
3-
Kurt Wirth
4-
2018-11-06
3+
Kurt Wirth and Ryan Moore
4+
2021-06-16
55

6-
A package extending the capability of [botometer](https://github.com/IUNetSci/botometer-python) by measuring suspected bot activity in any given Twitter query. This README is derived from Matt Kearney's excellent [rtweet]((https://github.com/mkearney/rtweet)) documentation.
6+
A package extending the capability of
7+
[botometer](https://github.com/IUNetSci/botometer-python) by measuring
8+
suspected bot activity in any given Twitter query. This README is
9+
derived from Matt Kearney’s excellent
10+
[rtweet]((https://github.com/mkearney/rtweet)) documentation.
711

8-
Install
9-
-------
12+
## Install
1013

1114
Install from GitHub with the following code:
1215

@@ -17,21 +20,40 @@ if (!requireNamespace("devtools", quietly = TRUE)) {
1720
devtools::install_github("kurtawirth/botscan")
1821
```
1922

20-
This package connects <code>botometer</code> to <code>rtweet</code>. As a result, each user must have previously acquired authentication from Twitter and instructions to do that [can be found here](http://rtweet.info/articles/auth.html).
23+
This package connects <code>botometer</code> to <code>rtweet</code>. On
24+
first load, rtweet will request authentication in your default browser.
25+
Accept the connection, return to RStudio, and botscan will continue
26+
automatically.
2127

22-
Users will also need to install the latest version of Python, as botscan accesses Python-based <code>botometer</code>, as well as acquiring a [Mashape key](https://market.mashape.com/OSoMe/botometer). Doing so requires a Mashape account. BotOMeter's default rate limit is 2,000 inquiries per day, so users are strongly encouraged to sign up for a (free) BotOMeter Pro key. Instructions to do that [can be found here](https://market.mashape.com/OSoMe/botometer-pro/pricing).
28+
Users will also need to install the latest version of Python and add it
29+
to the PATH upon installing when prompted, as botscan accesses
30+
Python-based <code>botometer</code>, as well as acquiring a [RapidAPI
31+
key](https://rapidapi.com/OSoMe/api/botometer-pro). Doing so requires a
32+
RapidAPI account. BotOMeter Pro is free and has a rate limit of 2,000
33+
inquiries per day. Alternatively, users can opt for the Ultra plan,
34+
which enables 17,280 inquires per day and costs $50/month. Plans can be
35+
chosen [here](https://rapidapi.com/OSoMe/api/botometer-pro/pricing).
2336

24-
Usage
25-
-----
37+
## Usage
2638

27-
There are two functions currently live for botscan.
39+
There are three functions currently live for botscan.
2840

29-
To begin, the user must first enter the following code, inserting their keys where appropriate:
41+
To begin, the user must first enter the following code:
42+
43+
``` install
44+
install_botometer()
45+
```
46+
47+
If your Python has been added to your machine’s PATH as mentioned above,
48+
this function will install BotOMeter via <code>pip</code>.
49+
50+
Next, a user must create a bom object in their environment with the
51+
following code, inserting their keys where appropriate:
3052

3153
``` setup
32-
bom <- setup_botscan("YourMashapeKey",
33-
"YourTwitterConsumerKey",
34-
"YourTwitterConsumerSecret",
54+
bom <- setup_botscan("YourRapiAPIKey",
55+
"YourTwitterConsumerAPIKey",
56+
"YourTwitterConsumerAPISecretKey",
3557
"YourTwitterAccessToken",
3658
"YourTwitterAccessTokenSecret")
3759
```
@@ -40,45 +62,81 @@ Currently, this must be done at the start of every session.
4062

4163
Next, the fun begins with <code>botscan</code>.
4264

43-
Its first argument takes any Twitter query, complete with boolean operators if desired, surrounded by quotation marks.
44-
45-
The next argument determines how long an open stream of tweets will be collected, with a default of 30 seconds. In order to gather a specific volume of tweets, it is suggested that the user run a small initial test to determine a rough rate of tweets for the given query. If the user prefers to use Twitter's Search API, the next argument allows the user to specify the number of tweets to extract.
46-
47-
The fourth argument determines whether retweets will be included if using the Search API and the fifth takes a number, less than one, that represents the desired threshold at which an account should be considered a bot. The default is .430, a reliable threshold as described by BotOMeter's creator [here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
48-
49-
The sixth argument allows the user to toggle between user-level and conversation-level summaries. The default is set to conversation-level data, understood as the proportion of the queried conversation that is bot-related. If <code>user\_level</code> is set to <code>TRUE</code>, <code>botscan</code> will return user-level data, understood to be the proportion of the queried conversation's authors that are estimated to be bots.
50-
51-
The seventh argument allows the user to toggle between Twitter's Search and Streaming APIs. The default is set to using the Streaming API, as it is unfiltered by Twitter and thus produces more accurate data. Search API data is filtered to eliminate low quality content, thus negatively impacting identification of bot accounts.
52-
53-
The eighth argument allows the user to opt out of auto-parsing of data, primarily useful when dealing with large volumes of data. The ninth and final argument defaults to keeping the user informed about the progress of the tool in gathering and processing data with the <code>verbose</code> package but can be toggled off.
65+
Its first argument takes any Twitter query, complete with boolean
66+
operators if desired, surrounded by quotation marks.
67+
68+
The second argument allows the user to provide external data in the form
69+
of any Twitter object with a column named “screen\_name”. The user can
70+
simply answer this argument with the name of any Twitter object in their
71+
environment and botscan will skip data collection and use the user’s
72+
data instead.
73+
74+
The next argument determines how long an open stream of tweets will be
75+
collected, with a default of 30 seconds. In order to gather a specific
76+
volume of tweets, it is suggested that the user run a small initial test
77+
to determine a rough rate of tweets for the given query. If the user
78+
prefers to use Twitter’s Search API, the next argument allows the user
79+
to specify the number of tweets to extract.
80+
81+
The fourth argument takes a number, less than one, that represents the
82+
desired threshold at which an account should be considered a bot. The
83+
default is .430, a reliable threshold as described by BotOMeter’s
84+
creator
85+
[here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
86+
87+
The fifth argument allows the user to toggle between Twitter’s Search
88+
and Streaming APIs. The default is set to using the Streaming API, as it
89+
is unfiltered by Twitter and thus produces more accurate data. Search
90+
API data is filtered to eliminate low quality content, thus negatively
91+
impacting identification of bot accounts.
92+
93+
The sixth argument allows the user to determine the volume of tweets
94+
desired when using the Search API. Note that this argument will be
95+
ignored when using the Streaming API.
96+
97+
The seventh argument determines whether retweets will be included if
98+
using the Search API. Likewise, this argument will be ignored when using
99+
the Streaming API.
100+
101+
The eighth argument allows the user to opt out of auto-parsing of data,
102+
primarily useful when dealing with large volumes of data. The ninth and
103+
final argument defaults to keeping the user informed about the progress
104+
of the tool in gathering and processing data with the
105+
<code>verbose</code> package but can be toggled off.
54106

55107
``` r
56108
## load botscan
57109
library(botscan)
58110

59111
## Enter query surrounded by quotation marks
60112
botscan("#rstats")
61-
#> [1] 0.1642276
62113

63-
## Result is percentage - in this case, 16.42276%.
114+
## Result is a list of three objects, described below
64115

65116
## If desired, choose the stream time and threshold
66117
botscan("#rstats", timeout = 60, threshold = .995)
67-
#> [1] 0.02398524
68118

69119
## Alternatively, choose to use Twitter's Search API and options associated with it.
70120
botscan("#rstats", n_tweets = 1500, retweets = TRUE, search = TRUE, threshold = .995)
71-
#> [1] 0.03270932
72-
73-
## Result is percentage - in this case, 2.398524%.
74121

75122
##If desired, scan only users rather than the conversation as a whole.
76123
botscan("#rstats", user_level = TRUE)
77-
#> [1] 0.1505155
78-
79-
## Result is percentage - in this case, 15.05155%.
80124
```
81125

82-
This process takes some time, as botscan is currently built on a loop of BotOMeter. Efforts to mainstream this process are set as future goals. A standard pull of tweets via <code>botscan</code> processes approximately 11 to 12 accounts per minute in addition to the initial tweet streaming.
83-
84-
Twitter rate limits cap the number of Search results returned to 18,000 every 15 minutes. Thus, excessive use of <code>botscan</code> in a short amount of time may result in a warning and inability to pull results. In this event, simply wait 15 minutes and try again. In an effort to avoid the Twitter rate limit cap, <code>botscan</code> defaults to returning 1000 results when <code>search = TRUE</code>.
126+
The output from botscan is a list of three objects. The first is a
127+
dataframe including all raw data from Twitter and BotOMeter. The second
128+
is a string with the percentage of users in the data set that are
129+
estimated to be bots as determined by the user’s provided threshold. The
130+
third is the percentage of tweets that are estimated to be bot-authored
131+
as determined by the user’s provided threshold.
132+
133+
This process takes some time, as botscan is currently built on a loop of
134+
BotOMeter. A standard pull of tweets via botscan processes approximately
135+
11 to 12 accounts per minute in addition to the initial tweet streaming.
136+
137+
Twitter rate limits cap the number of Search results returned to 18,000
138+
every 15 minutes. Thus, excessive use of botscan in a short amount of
139+
time may result in a warning and inability to pull results. In this
140+
event, simply wait 15 minutes and try again. In an effort to avoid the
141+
Twitter rate limit cap, botscan defaults to returning 1000 results when
142+
search = TRUE.

0 commit comments

Comments
 (0)