You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are getting continuous problems with link checker errors, the solution to which seems to be adding more and more domains to the list that are excluded from being checked. As the number of excluded domains grows, the utility of the link checker diminishes.
As of Jan. 9 2025, these are the excluded domains from our link checker:
I think some of these are justifiably excluded, like fonts.gstatic.com and the microsoft one, but others are very wide-ranging, like doi.org.
We also run into errors with the cache, which has to be manually deleted here if some links threw errors in previous runs - at least I think that is the reason. These show up as cache errors.
This issue is for discussing possible solutions to these problems. Ultimately I think the link checker is a good thing to have, but it becomes tedious to deal with these same issues each time we build the page, and the growing list of excludes defeats the purpose.
The text was updated successfully, but these errors were encountered:
It may be that the "too many request" responses are resulting from other GitHub Action runners performing link checking on those URLs (many GitHub Action runners behind few public IPs...). If this is happening regularly, might add 429 to the default --accept '200..=204, 429, 500' HTTP status codes to effectively ignore it. Possibly the similar cause for intermittent 403s? (maybe doi.org added GitHub Action runner public IPs to a denylist, at least temporarily?)
Or if the broken links are few and not impacting the site, could change the github action workflow to run the link checker on demand instead of on every commit push & pull request.
We are getting continuous problems with link checker errors, the solution to which seems to be adding more and more domains to the list that are excluded from being checked. As the number of excluded domains grows, the utility of the link checker diminishes.
As of Jan. 9 2025, these are the excluded domains from our link checker:
doi.org
academic.oup.com/nar
gnu.org
anaconda.org
fonts.gstatic.com
www.microsoft.com/en-us/microsoft-365/onedrive/online-cloud-storage
I think some of these are justifiably excluded, like fonts.gstatic.com and the microsoft one, but others are very wide-ranging, like doi.org.
We also run into errors with the cache, which has to be manually deleted here if some links threw errors in previous runs - at least I think that is the reason. These show up as
cache error
s.This issue is for discussing possible solutions to these problems. Ultimately I think the link checker is a good thing to have, but it becomes tedious to deal with these same issues each time we build the page, and the growing list of excludes defeats the purpose.
The text was updated successfully, but these errors were encountered: