Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiter headers should provide the name of the limit policy #2491

Open
Bossett opened this issue May 14, 2024 · 4 comments
Open

Rate limiter headers should provide the name of the limit policy #2491

Bossett opened this issue May 14, 2024 · 4 comments

Comments

@Bossett
Copy link

Bossett commented May 14, 2024

Is your feature request related to a problem? Please describe.

In putting together my profile labeller, I've run into a number of limits: pds query limits for getting profiles, create/delete limits for managing lists, public API concurrency limits, etc. - and it's tricky configuring limiters/back-offs/etc. for each one as the query process is slightly different for each.

Describe the solution you'd like

I would like each rate limit policy to have a unique identifier in the headers (i.e. at https://github.com/bluesky-social/atproto/blob/main/packages/xrpc-server/src/rate-limiter.ts#L146) so that I can track and manage each one programmatically without having to discover and manage each limit separately.

Describe alternatives you've considered

An index of all the different limits would be a reasonable alternative, but that would have to be kept up-to-date. Alternately this could be included in the typescript agent so that error messages can be more specific (i.e. RateLimitCreateRecordExceeded).

@bnewbold
Copy link
Collaborator

Sort of tangential to your request here, but are there any specific rate-limits that have caused pain? Both for your specific use-cases, or for what you would perceive as "normal and expected" uses of the API (in case you feel your usage is special)?

@Bossett
Copy link
Author

Bossett commented May 14, 2024

As I labelled profiles, I was adding those users to multiple lists - 'label', 'label AND other label', 'label AND other label AND other label' - I was batching these up per minute to debounce. I have labels for 'no displayname' and 'no avatar' - which puts a user on three lists.

Since they would often fix that soon after first post - I would get the patten 'createRecord x 3' -> 1m -> 'deleteRecord x 3'. (This is all to enable "I want to block when this label combination is on an account".) This ran under the deletion limit (so <3k/hr).

The limit I hit was 35k per day which I assume is the new record limit - hence this feature request, I'm not totally sure if that's the case, or if that's some kind of global request limit.

Short term - I'm going to try a 10m debounce, and drop some of the lists so that there are fewer actions per account. If that works, I think the limits are fine. If that doesn't work, I suppose there's a second feature request for some kind of bulk import (maybe an applyWrites endpoint allowing 100 records at a time, but a limit of ~8,640 per day = 10ops/s ?).

(more generally - I would love rate limits to be expressed with a consistent time window - understand that's not always tenable, but per 5 minutes/per 10 minutes/etc. is much easier to debug than per 24h.)

@bnewbold
Copy link
Collaborator

Ah, so you are putting users on mod lists in addition to labeling them? I assume to allow people to block the lists?

The entire list and blocking infrastructure does not scale nearly as well as the labeling system. Labels are designed to scale to millions of accounts; lists are not, they can stretch but really the whole system isn't happy beyond the design goal of tends of thousands of list-membership records in a single repo. Part of that is repo event churn rate (number of commits, regardless of what is in the commit), and total number of records in a repo (regardless of type or size of record).

I think we are unlikely to change the record creation limits, because those really are churn and work for both our services and firehose consumers. The current limits are sort of intended for peak uses (eg, an account having a very busy day, doing a backfill import, things like that). To be clear, your use isn't anywhere near causing overall scaling issues, but we don't want to encourage lots of folks to be pushing the limit. (these are my individual takes on this limit, we might change course or come to a different decision as a team)

There are some interesting questions about whether and how to enable things like "block by label" (there are several issues, requests, and conversations going on around this). We didn't originally intend to allow full block-style partitioning of the network, for example two groups of a million people each, all mutually blocking. I don't think we are stridently against something like that, but it raises a lot of questions around governance and the nature of the network. The original design goal for mod lists was to allow on the order of millions of accounts to subscribe to a list with on the order of thousands of accounts on it.

Specifically for the new account and "missing profile" type stuff: I think it is really great to have these lists now. It does feel like in the long run we could achieve some of these behaviors more efficiently if we did something like hydrate through an account creation timestamp, and then have a non-label-based config or behaviors around account age. But it is really great and creative to use labels to work around this gap in the APIs and app, and we don't have a specific timeline for new-account-specific behaviors (though it is on our anti-harassment planning list).

@Bossett
Copy link
Author

Bossett commented May 14, 2024

Yep - that's exactly it. And appreciate the list/label distinction (i.e. that I'm stretching what 'lists' really do), so wouldn't be advocating for changing the limits significantly - just a little restructuring so that it's easier to tell which limit you're hitting (i.e. a unique name I can track per limit) so I know which behaviour is an issue.

Things like peak and burst are an issue, and you start getting into multi-limit things - i.e. 'x endpoint has y limit AND x user has z tokens per day AND... etc.' to make that work right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants