-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
Milestone
Description
Problem
- If my sphinx documentation contains multiple links with anchors to a web page with multiple anchors, it will download the page multiple times, once per anchor to check
- This scales very badly. If I have many hundreds or thousands of anchors (e.g. for automatically generated documentation), it might download several megabytes × the number of links. This can end up being multiple gigabytes
Procedure to reproduce the problem
- create a document with links to anchors on the same web page
- run the link checker; it will fetch the page multiple times
Expected results
- I would suggest that the link checker could cache the anchors on webpages, so that it only downloads each page once, and only checks each link once. It could build a dictionary of pages to check, and store the anchors as a list or dict within it? Since we know up front which of our links have anchors, we can skip storing them when we know it's unnecessary.
- There may be other better ways of doing this; I'm not familiar with the internals of the link checker.
Reproducible project / your project
- https://github.com/openmicroscopy/bioformats/tree/develop/docs/sphinx
- contains lots of links to https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html
Environment info
- OS: Any
- Python version: Any
- Sphinx version: Any