-
-
Notifications
You must be signed in to change notification settings - Fork 598
feat: add persistent worker for sphinxdocs #2938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into prototype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some changes that I think make this work as a basic incremental builder that relies on the builtin timestamp based detection. Not perfect, but progress!
The key change was to move the doctree directory out of the builder output directory. The builder output directory gets deleted every time the action has to re-run, which would delete the doctree files, too. Now that the doctree files are preserved between worker requests, it's able to reuse them.
I had an idea for registering an extension without modifying conf.py
Communicating from the worker loop to the extension is the tricky part. I see in the prototype, digest.json is being used by sticking it in the doctree directory. This is an appealing idea! A regular global would work, too, which would avoid any serialization overhead. Regardless, however it works, the content has to be specific to the particular src/out directory because a worker could service different requests for different doc roots. Alternatively, the extension could just hash everything itself. This would at least work in a more general sense, better than timestamps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the implementation heavily. Two things of note.
The worker installs an in-memory extension that implements the same basic get-env-outdated logic as in the pigweed extension. I'm not 100% sure that simple logic is 100% correct because the timestamp implementation does a lot more stuff. But, my testing didn't show errors.
There was a certain appeal to the digest.json method the original impl / pigweed extension were using, so I had the worker write a similar looking file. It passes the path to the file by using --define
to set a value in the config file. Extensions can find it via app.config and looking up that config value.
Another finding: adding no-sandbox will help incremental building. This comes back to the lack of a persistent doctree dir. The env object is pickled into the doctree dir; the env object is what persists information (such as what files already exist, their last modify times, etc). Even if --doctree-dir is specified, because it's not a declared input, when a sandbox'd execution occurs, the prior run's doctree directory isn't copied into the sandbox. It can't be because it's not a File input. And thus, Sphinx starts with an empty env, sees all files as new, and has to start from scratch. Setting no-sandbox allows the files from a previous run to become visible. This saved about half the execution time (8s -> 4s) when building rule_python's docs. Additionally, with the worker-based impl, which is able to fully persist the doctree dir and output dir between runs, incremental building is way faster: 8s -> 0.8 seconds |
FYI I'm on staycation this week. Will review if I have some free time otherwise can review next week |
Oh, to clarify, I'm not blocked on needing review. I can just merge it. |
This implements a simple, serialized persistent worker for Sphinxdocs with several optimizations. It is enabled by default.
is necessary.
to the expected Bazel output directory afterwards. This allows Sphinx to only
write output files that need to be updated.
This works by having the worker compute what files have changed and having a Sphinx
extension use the
get-env-outdated
event to tell Sphinx which files have changed.The extension is based on https://pwrev.dev/294057, but re-implemented to be
in-memory as part of the worker instead of a separate extension projects must configure.
For rules_python's doc building, this reduces incremental building from about 8 seconds
to about 0.8 seconds. From what I can tell, about half the time is spent generating
doctrees, and the other half generating the output files.
Worker mode is enabled by default and can be disabled on the target or by adjusting
the Bazel flags controlling execution strategy. Docs added to explain how.
Because
--doctree-dir
is now always specified and outside the output dir,non-worker invocations can benefit, too, if run without sandboxing. Docs added to
explain how to do this.
Along the way:
--write-all
and--fresh-env
from run args. This lets directinvocations benefit from the normal caching Sphinx does.
--foo=bar
so they are a single element; justa bit nicer to see when debugging.
Work towards #2878, #2879