Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to throttle calls to get_status? #130

Open
RichardMN opened this issue May 10, 2023 · 10 comments
Open

Any way to throttle calls to get_status? #130

RichardMN opened this issue May 10, 2023 · 10 comments

Comments

@RichardMN
Copy link

It's more of a question than a comment...

I have a frame with a number of toot ids generated from elsewhere and I want to try to pull additional information from them. This ends up looking something like...

  icymibot_expanded <- icymibot_data |>
    top_n(500,created_at) |>
    rowwise() |>
    mutate(details=get_status(toot_id)) |>
       ...

But it crashes out after 300 rows (as far as I can tell). Is there a way to tell rtoot as a whole to back-off and that I'd rather that it chill a bit between requests rather than it hammer the server in an asocial manner and get told to stop (Status code: 429)?

@schochastics
Copy link
Member

schochastics commented May 10, 2023

hmm interesting so 300 calls is the 5 minute rate limit but we do have rate limit checks implemented(i.e. sleep until reset of rate limit). Can you share a reproducible example so I can check whats going on?

@RichardMN
Copy link
Author

It'll be about a week before I can put together a reproducible example but I'll try to do so when I can.

@schochastics
Copy link
Member

Dont worry, I am moving house atm so take your time

@RichardMN
Copy link
Author

This isn't a reprex but an outline of how I appear to be working around it for now. The following code managed to pull all 500 lines of what I was trying - or at least it managed to return a tibble with 500 rows, though some of them have NAs which I'll deal with later. I think that it may have been that I had a problematic id which made get_status fail with an error. Wrapping it with slowly means I'm less worried about hammering the server, and possibly lets it fail in a controlled manner.

slow_get_status <- slowly(possibly(get_status, otherwise = NA), rate_delay(2.5), quiet=FALSE)
icymibot_rowwise <- icymibot_data |>
  top_n(500,created_at) |>
  rowwise()
icymibot_details <- icymibot_rowwise |>
  mutate(details=slow_get_status(toot_id))

@schochastics
Copy link
Member

Could you try to run this "unvectorized", something like:

icymibot_rowwise <- list()
for(i in seq_len(nrow(icymibot_rowwise)){
    icymibot_rowwise$details[[i]] <- get_status(icymibot_rowwise$toot_id[[i]])
}

If this bugs out after 300, then I think there might be an issue with our rate limit checking

@fjnitsch
Copy link

fjnitsch commented Feb 1, 2024

I can confirm that error "Status Code: 429" persists if you use a for loop as the one above.
When I wrapped the statement in a tryCatch block to skip a line upon error, it had the same
error for all subsequent lines (suggesting that it is indeed the ratelimit).

@schochastics
Copy link
Member

Did you use the current version 0.3.4? We attempted to implement a better 429 handling there, which apparently needs more work if you are on the most recent version

@fjnitsch
Copy link

fjnitsch commented Feb 1, 2024

Sorry my bad. Updated the package and now get the "too many requests. Sleeping for 5 minutes". Thank you!

@RichardMN
Copy link
Author

Sorry not to come back on this. I ended up wrapping my calls in slowly and possibly and this has been slow but steady and works:

slow_get_status <- slowly(possibly(get_status, otherwise = NA), rate_delay(2.5), quiet=FALSE)
slow_get_reblogged_by <- slowly(possibly(get_reblogged_by, otherwise = NA), rate_delay(2), quiet=FALSE)

I may look at trying to run it without this wrapping again but I worry about hammering my server with repeated runs.

@leofontenelle
Copy link

leofontenelle commented Oct 20, 2024

Sorry my bad. Updated the package and now get the "too many requests. Sleeping for 5 minutes". Thank you!

I'm getting this 429 status multiple times within a single function, get_account_following() or get_account_followers(), when the limit is larger than 300 * 40, that is, trying to get all the followings / followers of highly connected accounts. I wonder this is because the server itself is overloaded, or if the functions should sleep between pages. (Honest question, I don't know about writing API clients.)

Update: I ended up doing something like this and still get occasional rate limit messages (and 503 errors) even though I'm not using the API through other means, so I guess the problem is the instance itself is struggling.

max_id = NULL
for (i in seq.int(ceiling(40L/page.size))) {
  api_response = get_account_followers(id, max_id, ...)
  # do stuff with api_response
  Sys.sleep(1)
  if (rtoot:::break_process_request(api_response, TRUE, verbose)) break
  max_id = attr(api_response, "headers")$max_id
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants