You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two functions currently live for botscan.
42
+
There are three functions currently live for botscan.
41
43
42
-
To begin, the user must first enter the following code, inserting their keys
43
-
where appropriate:
44
+
To begin, the user must first enter the following code:
44
45
45
-
```{setup instructions, eval = FALSE}
46
-
bom <- setup_botscan("YourMashapeKey",
47
-
"YourTwitterConsumerKey",
48
-
"YourTwitterConsumerSecret",
46
+
```{install, eval = FALSE}
47
+
install_botometer()
48
+
```
49
+
50
+
If your Python has been added to your machine's PATH as mentioned above, this
51
+
function will install BotOMeter via <code>pip</code>.
52
+
53
+
Next, a user must create a bom object in their environment with the following
54
+
code, inserting their keys where appropriate:
55
+
56
+
```{setup, eval = FALSE}
57
+
bom <- setup_botscan("YourRapiAPIKey",
58
+
"YourTwitterConsumerAPIKey",
59
+
"YourTwitterConsumerAPISecretKey",
49
60
"YourTwitterAccessToken",
50
61
"YourTwitterAccessTokenSecret")
51
62
```
@@ -57,31 +68,36 @@ Next, the fun begins with <code>botscan</code>.
57
68
Its first argument takes any Twitter query, complete with boolean operators if
58
69
desired, surrounded by quotation marks.
59
70
71
+
The second argument allows the user to provide external data in the form of any
72
+
Twitter object with a column named "screen_name". The user can simply answer
73
+
this argument with the name of any Twitter object in their environment and
74
+
botscan will skip data collection and use the user's data instead.
75
+
60
76
The next argument determines how long an open stream of tweets will be
61
77
collected, with a default of 30 seconds. In order to gather a specific volume
62
78
of tweets, it is suggested that the user run a small initial test to determine
63
79
a rough rate of tweets for the given query. If the user prefers to use Twitter's
64
80
Search API, the next argument allows the user to specify the number of tweets
65
81
to extract.
66
82
67
-
The fourth argument determines whether retweets will be included if using the
68
-
Search API and the fifth takes a number, less than one, that represents the
83
+
The fourth argument takes a number, less than one, that represents the
69
84
desired threshold at which an account should be considered a bot. The default
70
85
is .430, a reliable threshold as described by BotOMeter's creator [here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
71
86
72
-
The sixth argument allows the user to toggle between user-level and
73
-
conversation-level summaries. The default is set to conversation-level
74
-
data, understood as the proportion of the queried conversation that is
75
-
bot-related. If <code>user_level</code> is set to <code>TRUE</code>,
76
-
<code>botscan</code> will return user-level data, understood to be the
77
-
proportion of the queried conversation's authors that are estimated to be bots.
78
-
79
-
The seventh argument allows the user to toggle between Twitter's Search and
87
+
The fifth argument allows the user to toggle between Twitter's Search and
80
88
Streaming APIs. The default is set to using the Streaming API, as it is
81
89
unfiltered by Twitter and thus produces more accurate data. Search API data is
82
90
filtered to eliminate low quality content, thus negatively impacting
83
91
identification of bot accounts.
84
92
93
+
The sixth argument allows the user to determine the volume of tweets desired
94
+
when using the Search API. Note that this argument will be ignored when using
95
+
the Streaming API.
96
+
97
+
The seventh argument determines whether retweets will be included if using the
98
+
Search API. Likewise, this argument will be ignored when using
99
+
the Streaming API.
100
+
85
101
The eighth argument allows the user to opt out of auto-parsing of data,
86
102
primarily useful when dealing with large volumes of data. The ninth and final
87
103
argument defaults to keeping the user informed about the progress of the tool
@@ -94,35 +110,32 @@ library(botscan)
94
110
95
111
## Enter query surrounded by quotation marks
96
112
botscan("#rstats")
97
-
#> [1] 0.1642276
98
113
99
-
## Result is percentage - in this case, 16.42276%.
114
+
## Result is a list of three objects, described below
100
115
101
116
## If desired, choose the stream time and threshold
A package extending the capability of [botometer](https://github.com/IUNetSci/botometer-python) by measuring suspected bot activity in any given Twitter query. This README is derived from Matt Kearney's excellent [rtweet]((https://github.com/mkearney/rtweet)) documentation.
6
+
A package extending the capability of
7
+
[botometer](https://github.com/IUNetSci/botometer-python) by measuring
8
+
suspected bot activity in any given Twitter query. This README is
@@ -17,21 +20,40 @@ if (!requireNamespace("devtools", quietly = TRUE)) {
17
20
devtools::install_github("kurtawirth/botscan")
18
21
```
19
22
20
-
This package connects <code>botometer</code> to <code>rtweet</code>. As a result, each user must have previously acquired authentication from Twitter and instructions to do that [can be found here](http://rtweet.info/articles/auth.html).
23
+
This package connects <code>botometer</code> to <code>rtweet</code>. On
24
+
first load, rtweet will request authentication in your default browser.
25
+
Accept the connection, return to RStudio, and botscan will continue
26
+
automatically.
21
27
22
-
Users will also need to install the latest version of Python, as botscan accesses Python-based <code>botometer</code>, as well as acquiring a [Mashape key](https://market.mashape.com/OSoMe/botometer). Doing so requires a Mashape account. BotOMeter's default rate limit is 2,000 inquiries per day, so users are strongly encouraged to sign up for a (free) BotOMeter Pro key. Instructions to do that [can be found here](https://market.mashape.com/OSoMe/botometer-pro/pricing).
28
+
Users will also need to install the latest version of Python and add it
29
+
to the PATH upon installing when prompted, as botscan accesses
30
+
Python-based <code>botometer</code>, as well as acquiring a [RapidAPI
31
+
key](https://rapidapi.com/OSoMe/api/botometer-pro). Doing so requires a
32
+
RapidAPI account. BotOMeter Pro is free and has a rate limit of 2,000
33
+
inquiries per day. Alternatively, users can opt for the Ultra plan,
34
+
which enables 17,280 inquires per day and costs $50/month. Plans can be
There are two functions currently live for botscan.
39
+
There are three functions currently live for botscan.
28
40
29
-
To begin, the user must first enter the following code, inserting their keys where appropriate:
41
+
To begin, the user must first enter the following code:
42
+
43
+
```install
44
+
install_botometer()
45
+
```
46
+
47
+
If your Python has been added to your machine’s PATH as mentioned above,
48
+
this function will install BotOMeter via <code>pip</code>.
49
+
50
+
Next, a user must create a bom object in their environment with the
51
+
following code, inserting their keys where appropriate:
30
52
31
53
```setup
32
-
bom <- setup_botscan("YourMashapeKey",
33
-
"YourTwitterConsumerKey",
34
-
"YourTwitterConsumerSecret",
54
+
bom <- setup_botscan("YourRapiAPIKey",
55
+
"YourTwitterConsumerAPIKey",
56
+
"YourTwitterConsumerAPISecretKey",
35
57
"YourTwitterAccessToken",
36
58
"YourTwitterAccessTokenSecret")
37
59
```
@@ -40,45 +62,81 @@ Currently, this must be done at the start of every session.
40
62
41
63
Next, the fun begins with <code>botscan</code>.
42
64
43
-
Its first argument takes any Twitter query, complete with boolean operators if desired, surrounded by quotation marks.
44
-
45
-
The next argument determines how long an open stream of tweets will be collected, with a default of 30 seconds. In order to gather a specific volume of tweets, it is suggested that the user run a small initial test to determine a rough rate of tweets for the given query. If the user prefers to use Twitter's Search API, the next argument allows the user to specify the number of tweets to extract.
46
-
47
-
The fourth argument determines whether retweets will be included if using the Search API and the fifth takes a number, less than one, that represents the desired threshold at which an account should be considered a bot. The default is .430, a reliable threshold as described by BotOMeter's creator [here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
48
-
49
-
The sixth argument allows the user to toggle between user-level and conversation-level summaries. The default is set to conversation-level data, understood as the proportion of the queried conversation that is bot-related. If <code>user\_level</code> is set to <code>TRUE</code>, <code>botscan</code> will return user-level data, understood to be the proportion of the queried conversation's authors that are estimated to be bots.
50
-
51
-
The seventh argument allows the user to toggle between Twitter's Search and Streaming APIs. The default is set to using the Streaming API, as it is unfiltered by Twitter and thus produces more accurate data. Search API data is filtered to eliminate low quality content, thus negatively impacting identification of bot accounts.
52
-
53
-
The eighth argument allows the user to opt out of auto-parsing of data, primarily useful when dealing with large volumes of data. The ninth and final argument defaults to keeping the user informed about the progress of the tool in gathering and processing data with the <code>verbose</code> package but can be toggled off.
65
+
Its first argument takes any Twitter query, complete with boolean
66
+
operators if desired, surrounded by quotation marks.
67
+
68
+
The second argument allows the user to provide external data in the form
69
+
of any Twitter object with a column named “screen\_name”. The user can
70
+
simply answer this argument with the name of any Twitter object in their
71
+
environment and botscan will skip data collection and use the user’s
72
+
data instead.
73
+
74
+
The next argument determines how long an open stream of tweets will be
75
+
collected, with a default of 30 seconds. In order to gather a specific
76
+
volume of tweets, it is suggested that the user run a small initial test
77
+
to determine a rough rate of tweets for the given query. If the user
78
+
prefers to use Twitter’s Search API, the next argument allows the user
79
+
to specify the number of tweets to extract.
80
+
81
+
The fourth argument takes a number, less than one, that represents the
82
+
desired threshold at which an account should be considered a bot. The
83
+
default is .430, a reliable threshold as described by BotOMeter’s
## Result is percentage - in this case, 2.398524%.
74
121
75
122
##If desired, scan only users rather than the conversation as a whole.
76
123
botscan("#rstats", user_level=TRUE)
77
-
#> [1] 0.1505155
78
-
79
-
## Result is percentage - in this case, 15.05155%.
80
124
```
81
125
82
-
This process takes some time, as botscan is currently built on a loop of BotOMeter. Efforts to mainstream this process are set as future goals. A standard pull of tweets via <code>botscan</code> processes approximately 11 to 12 accounts per minute in addition to the initial tweet streaming.
83
-
84
-
Twitter rate limits cap the number of Search results returned to 18,000 every 15 minutes. Thus, excessive use of <code>botscan</code> in a short amount of time may result in a warning and inability to pull results. In this event, simply wait 15 minutes and try again. In an effort to avoid the Twitter rate limit cap, <code>botscan</code> defaults to returning 1000 results when <code>search = TRUE</code>.
126
+
The output from botscan is a list of three objects. The first is a
127
+
dataframe including all raw data from Twitter and BotOMeter. The second
128
+
is a string with the percentage of users in the data set that are
129
+
estimated to be bots as determined by the user’s provided threshold. The
130
+
third is the percentage of tweets that are estimated to be bot-authored
131
+
as determined by the user’s provided threshold.
132
+
133
+
This process takes some time, as botscan is currently built on a loop of
134
+
BotOMeter. A standard pull of tweets via botscan processes approximately
135
+
11 to 12 accounts per minute in addition to the initial tweet streaming.
136
+
137
+
Twitter rate limits cap the number of Search results returned to 18,000
138
+
every 15 minutes. Thus, excessive use of botscan in a short amount of
139
+
time may result in a warning and inability to pull results. In this
140
+
event, simply wait 15 minutes and try again. In an effort to avoid the
141
+
Twitter rate limit cap, botscan defaults to returning 1000 results when
0 commit comments