Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance archives to handle temporary Internet Archive IP blocking more gracefully. #20

Open
maxachis opened this issue Jun 22, 2024 · 0 comments

Comments

@maxachis
Copy link
Contributor

Because the Internet Archives temporarily blocks IP addresses which attempt to perform more than 15 archive requests in a single minute, the code has been throttled to reduce the risk of requests being dropped.

Unfortunately, every now and then requests still get dropped due to the IP blocking, despite the throttling.

Because the archives operation works on a periodic scheduled basis, this generally doesn't pose a problem, as those URLs will eventually be archived. But it is not graceful, and it'd probably be better if we had requests anticipate the IP throttling and wait a while before attempting it again, and only stopping for good if waiting fails 3 times in a row.

IP address blocking occurs for a period of 5 minutes at a time, per the linked documentation, so waiting for that length of time in the case of a NewConnectionError would be one solution. But there could be others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant