-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled streaming toots (fix #84) #86
Merged
+399
−1
Merged
Changes from 5 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
cb16490
added stream function outlines
schochastics e5fc4ee
added internal stream functions and a parser
schochastics d03171f
added curl/jsonlite to Imports and fixed some doc
schochastics 132e50d
added handle for authorization
schochastics 886771d
moved file to stream_toots
schochastics 7228b31
Make `stream_timeline_hashtag` accept the hash
chainsawriot 02f463d
echoing tmp file, added vignette, and extended README
schochastics 3a4a2ce
retify the `verbose` behavior of all `stream_*` functions
chainsawriot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,8 +16,10 @@ Depends: | |
R (>= 3.6) | ||
Imports: | ||
clipr, | ||
curl, | ||
dplyr, | ||
httr, | ||
jsonlite, | ||
tibble | ||
Suggests: | ||
knitr, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
#' Collect live streams of Mastodon data | ||
#' @name stream_timeline | ||
#' @param timeout Integer, Number of seconds to stream toots for. Stream indefinitely with timeout = Inf. The stream can be interrupted at any time, and file_name will still be a valid file. | ||
#' @param local logical, Show only local statuses (either statuses from your instance or the one provided in `instance`)? | ||
#' @param file_name character, name of file. If not specified, will write to a temporary file stream_toots*.json. | ||
#' @param append logical, if TRUE will append to the end of file_name; if FALSE, will overwrite. | ||
#' @param verbose logical whether to display messages | ||
#' @param list_id character, id of list to stream | ||
#' @param hashtag character, hashtag to stream | ||
#' @inheritParams get_instance | ||
#' @details | ||
#' \describe{ | ||
#' \item{stream_timeline_public}{stream all statuses} | ||
#' \item{stream_timeline_hashtag}{stream all statuses containing a specific hashtag } | ||
#' \item{stream_timeline_list}{stream the statuses of a list} | ||
#' } | ||
#' @export | ||
#' @examples | ||
#' \dontrun{ | ||
#' # stream public timeline for 30 seconds | ||
#' stream_timeline_public(timeout = 30,file_name = "public.json") | ||
#' # stream timeline of mastodon.social for 30 seconds | ||
#' stream_timeline_public(timeout = 30, local = TRUE, | ||
#' instance = "mastodon.social", file_name = "social.json") | ||
#' | ||
#' # stream hashtag timeline for 30 seconds | ||
#' stream_timeline_hashtag("rstats", timeout = 30, file_name = "rstats_public.json") | ||
#' # stream hashtag timeline of mastodon.social for 30 seconds | ||
#' stream_timeline_hashtag("rstats", timeout = 30, local = TRUE, | ||
#' instance = "fosstodon.org", file_name = "rstats_foss.json") | ||
#' } | ||
stream_timeline_public <- function( | ||
timeout = 30, | ||
local = FALSE, | ||
file_name = NULL, | ||
append = TRUE, | ||
instance = NULL, | ||
token = NULL, | ||
anonymous = FALSE, | ||
verbose = TRUE){ | ||
|
||
path <- "/api/v1/streaming/public" | ||
if(isTRUE(local)){ | ||
path <- paste0(path,"/local") | ||
} | ||
params <- list() | ||
|
||
quiet_interrupt( | ||
stream_toots(timeout,file_name, append, token, path, params, instance, anonymous) | ||
) | ||
invisible(NULL) | ||
} | ||
|
||
#' @rdname stream_timeline | ||
#' @export | ||
stream_timeline_hashtag <- function( | ||
hashtag = "rstats", | ||
timeout = 30, | ||
local = FALSE, | ||
file_name = NULL, | ||
append = TRUE, | ||
instance = NULL, | ||
token = NULL, | ||
anonymous = FALSE, | ||
verbose = TRUE){ | ||
|
||
path <- "/api/v1/streaming/hashtag" | ||
if(isTRUE(local)){ | ||
path <- paste0(path,"/local") | ||
} | ||
params <- list(tag = hashtag) | ||
|
||
quiet_interrupt( | ||
stream_toots(timeout,file_name, append, token, path, params, instance, anonymous) | ||
) | ||
invisible(NULL) | ||
} | ||
|
||
#' @rdname stream_timeline | ||
#' @export | ||
stream_timeline_list <- function( | ||
list_id, | ||
timeout = 30, | ||
file_name = NULL, | ||
append = TRUE, | ||
instance = NULL, | ||
token = NULL, | ||
anonymous = FALSE, | ||
verbose = TRUE){ | ||
|
||
path <- "api/v1/streaming/list" | ||
params <- list(list = list_id) | ||
|
||
quiet_interrupt( | ||
stream_toots(timeout,file_name, append, token, path, params, instance, anonymous) | ||
) | ||
invisible(NULL) | ||
} | ||
|
||
#' Parser of Mastodon stream | ||
#' | ||
#' Converts Mastodon stream data (JSON file) into parsed tibble. | ||
#' @param path Character, name of JSON file with data collected by any [stream_timeline] function. | ||
#' @export | ||
#' @seealso `stream_timeline_public()`, `stream_timeline_hashtag()`,`stream_timeline_list()` | ||
#' @examples | ||
#' \dontrun{ | ||
#' stream_timeline_public(1,file_name = "stream.json") | ||
#' parse_stream("stream.json") | ||
#' } | ||
parse_stream <- function(path){ | ||
json <- readLines(path) | ||
tbl <- dplyr::bind_rows(lapply(json,function(x) parse_status(jsonlite::fromJSON(x)))) | ||
tbl[order(tbl[["created_at"]]),] | ||
} | ||
|
||
|
||
stream_toots <- function(timeout,file_name = NULL, append, token, path, params, | ||
instance = NULL, anonymous = FALSE, verbose = TRUE,...){ | ||
if (is.null(instance) && anonymous) { | ||
stop("provide either an instance or a token") | ||
} | ||
h <- curl::new_handle(verbose = FALSE) | ||
if (is.null(instance)) { | ||
token <- check_token_rtoot(token) | ||
url <- prepare_url(token$instance) | ||
curl::handle_setheaders(h, "Authorization" = paste0("Bearer ",token$bearer)) | ||
} else { | ||
url <- prepare_url(instance) | ||
} | ||
|
||
if(is.null(file_name)){ | ||
file_name <- tempfile(pattern = "stream_toots", fileext = ".json") | ||
} | ||
|
||
url <- httr::modify_url(url,path = path,query = params) | ||
|
||
stopifnot(is.numeric(timeout), timeout > 0) | ||
stop_time <- Sys.time() + timeout | ||
|
||
output <- file(file_name) | ||
con <- curl::curl(url,handle = h) | ||
open(output,open = if (append) "ab" else "b") | ||
open(con = con, "rb", blocking = FALSE) | ||
sayif(verbose,"Streaming toots until ",stop_time) | ||
n_seen <- 0 | ||
while(isIncomplete(con) && Sys.time() < stop_time){ | ||
buf <- readLines(con,warn = FALSE) | ||
if(length(buf)){ | ||
line <- buf[grepl("created_at",buf)] # This seems unstable but rtweet does something similar | ||
line <- gsub("^data:\\s+","",line) | ||
line <- complete_line(line) | ||
writeLines(line,output) | ||
n_seen <- n_seen + length(line) | ||
cat("streamed toots: ",n_seen,"\r") | ||
} | ||
} | ||
on.exit({ | ||
close(con) | ||
close(output) | ||
}) | ||
invisible() | ||
} | ||
|
||
complete_line <- function(line){ | ||
line <- line[grepl("\\}$",line)] #delete incomplete lines | ||
line <- line[line!=""] # delete empty lines | ||
line | ||
} | ||
|
||
quiet_interrupt <- function(code) { | ||
tryCatch(code, interrupt = function(e) NULL) | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schochastics Would it be better to return an invisible
file_name
here (the same for allstream_*
functions)? It would be quite useful when the inputpath
(orfile_name
forstream_*
) isNULL
(the default). Iffile_name
doesn't get returned, one can probably never get the path of that temp file (unless going through all temp files).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like this:
One can never get the file back. (We might implement a
parse
parameter in the future, akinrtweet::stream_tweets
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah i didnt quite understand what the parsing does in rtweet? Let me try to fix this. We shouldnt loose toots in tempfiles😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their
parse
is just to return a tibble ifTRUE
and NULL ifFALSE
; kind of similar to theparse
parameter of ourget_*
functions. Maybe we can do better to return a tibble ifTRUE
and the (actual)file_name
ifFALSE
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does rtweet not loose track of the tmp file? https://github.com/ropensci/rtweet/blob/master/R/stream.R#L83-L88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets echoed. If you prefer, it can also be a solution.
https://github.com/ropensci/rtweet/blob/14c89649e152f1c1609f9dc3d65234dbf2560d8e/R/stream.R#L71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i think i prefer to echo the tmp file name and not return anything. The parsing should be done when reading the json. I will push something later as a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schochastics Sure thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chainsawriot Ok done. tmp file is now only echoed. I encourgae users in the vignette to always set a file_name