feat: add persistent worker for sphinxdocs #2938

kaycebasques · 2025-05-28T23:44:17Z

This implements a simple, serialized persistent worker for Sphinxdocs with several optimizations. It is enabled by default.

The worker computes what inputs have changed, allowing Sphinx to only rebuild what
is necessary.
Doctrees are written to a separate directory so they are retained between builds.
The worker tells Sphinx to write output to an internal directory, then copies it
to the expected Bazel output directory afterwards. This allows Sphinx to only
write output files that need to be updated.

This works by having the worker compute what files have changed and having a Sphinx
extension use the get-env-outdated event to tell Sphinx which files have changed.
The extension is based on https://pwrev.dev/294057, but re-implemented to be
in-memory as part of the worker instead of a separate extension projects must configure.

For rules_python's doc building, this reduces incremental building from about 8 seconds
to about 0.8 seconds. From what I can tell, about half the time is spent generating
doctrees, and the other half generating the output files.

Worker mode is enabled by default and can be disabled on the target or by adjusting
the Bazel flags controlling execution strategy. Docs added to explain how.

Because --doctree-dir is now always specified and outside the output dir,
non-worker invocations can benefit, too, if run without sandboxing. Docs added to
explain how to do this.

Along the way:

Remove --write-all and --fresh-env from run args. This lets direct
invocations benefit from the normal caching Sphinx does.
Change the args formatting to --foo=bar so they are a single element; just
a bit nicer to see when debugging.

Work towards #2878, #2879

sphinxdocs/private/sphinx.bzl

…into prototype

rickeylev

I pushed some changes that I think make this work as a basic incremental builder that relies on the builtin timestamp based detection. Not perfect, but progress!

The key change was to move the doctree directory out of the builder output directory. The builder output directory gets deleted every time the action has to re-run, which would delete the doctree files, too. Now that the doctree files are preserved between worker requests, it's able to reuse them.

rickeylev · 2025-06-01T01:00:19Z

I had an idea for registering an extension without modifying conf.py

sphinx.application.builtin_extensions is a list of extensions it always loads. We can implement an extension and add it to that list.

Communicating from the worker loop to the extension is the tricky part. I see in the prototype, digest.json is being used by sticking it in the doctree directory. This is an appealing idea! A regular global would work, too, which would avoid any serialization overhead. Regardless, however it works, the content has to be specific to the particular src/out directory because a worker could service different requests for different doc roots.

Alternatively, the extension could just hash everything itself. This would at least work in a more general sense, better than timestamps.

rickeylev

I cleaned up the implementation heavily. Two things of note.

The worker installs an in-memory extension that implements the same basic get-env-outdated logic as in the pigweed extension. I'm not 100% sure that simple logic is 100% correct because the timestamp implementation does a lot more stuff. But, my testing didn't show errors.

There was a certain appeal to the digest.json method the original impl / pigweed extension were using, so I had the worker write a similar looking file. It passes the path to the file by using --define to set a value in the config file. Extensions can find it via app.config and looking up that config value.

sphinxdocs/private/sphinx.bzl

rickeylev · 2025-06-03T20:59:45Z

Another finding: adding no-sandbox will help incremental building. This comes back to the lack of a persistent doctree dir. The env object is pickled into the doctree dir; the env object is what persists information (such as what files already exist, their last modify times, etc).

Even if --doctree-dir is specified, because it's not a declared input, when a sandbox'd execution occurs, the prior run's doctree directory isn't copied into the sandbox. It can't be because it's not a File input. And thus, Sphinx starts with an empty env, sees all files as new, and has to start from scratch.

Setting no-sandbox allows the files from a previous run to become visible. This saved about half the execution time (8s -> 4s) when building rule_python's docs.

Additionally, with the worker-based impl, which is able to fully persist the doctree dir and output dir between runs, incremental building is way faster: 8s -> 0.8 seconds

kaycebasques · 2025-06-04T00:59:45Z

FYI I'm on staycation this week. Will review if I have some free time otherwise can review next week

rickeylev · 2025-06-05T17:53:42Z

Oh, to clarify, I'm not blocked on needing review. I can just merge it.

feat: add persistent worker for sphinxdocs

b655782

kaycebasques requested review from rickeylev and aignas as code owners May 28, 2025 23:44

kaycebasques commented May 28, 2025

View reviewed changes

sphinxdocs/private/sphinx.bzl Outdated Show resolved Hide resolved

kaycebasques commented May 28, 2025

View reviewed changes

sphinxdocs/private/sphinx.bzl Outdated Show resolved Hide resolved

kaycebasques commented May 28, 2025

View reviewed changes

sphinxdocs/private/sphinx.bzl Show resolved Hide resolved

kaycebasques commented May 29, 2025

View reviewed changes

sphinxdocs/private/sphinx.bzl Outdated Show resolved Hide resolved

kaycebasques commented May 29, 2025

View reviewed changes

sphinxdocs/private/sphinx.bzl Outdated Show resolved Hide resolved

rickeylev added 6 commits May 31, 2025 10:38

Merge branch 'main' of https://github.com/bazel-contrib/rules_python …

ccfadf8

…into prototype

rename tmp to changed paths

fb6279b

rename use_cache to use_persistent_worker

782b0d4

cleanup logic to run action

4cf2ca8

fix use_persistent_workers type, enable for basic test

bfa6cfb

basic incremental worker

b497e62

rickeylev reviewed May 31, 2025

View reviewed changes

rickeylev mentioned this pull request Jun 1, 2025

sphinxdocs: implement content-based change detection plugin #2879

Open

rickeylev added 2 commits June 2, 2025 14:52

generate info file for extensions to use. also cleanup

3420416

cleanup

34e1cde

rickeylev reviewed Jun 2, 2025

View reviewed changes

rickeylev added 5 commits June 2, 2025 15:34

cleanup

bed4556

doc attr

c487060

trying to debug rbe

d968327

support non-worker invocation when rules try to use a worker invocation

4cea20a

fix bad arg

2912daf

rickeylev added 4 commits June 3, 2025 14:08

always set doctreedir

78651ea

enable worker by default so its used when available

283e565

fix doc typo

1e7fa2a

rm old use_persistent_Worker arg name

ceefdce

format files

f6e1c8b

rickeylev removed the request for review from aignas June 3, 2025 23:04

register config key so its available

2520a2d

rickeylev approved these changes Jun 5, 2025

View reviewed changes

rickeylev added this pull request to the merge queue Jun 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025

rickeylev added this pull request to the merge queue Jun 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025

rickeylev added this pull request to the merge queue Jun 5, 2025

Merged via the queue into bazel-contrib:main with commit 0498664 Jun 5, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add persistent worker for sphinxdocs #2938

feat: add persistent worker for sphinxdocs #2938

Uh oh!

kaycebasques commented May 28, 2025 •

edited by rickeylev

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rickeylev left a comment

Uh oh!

rickeylev commented Jun 1, 2025

Uh oh!

rickeylev left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rickeylev commented Jun 3, 2025

Uh oh!

kaycebasques commented Jun 4, 2025

Uh oh!

rickeylev commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: add persistent worker for sphinxdocs #2938

feat: add persistent worker for sphinxdocs #2938

Uh oh!

Conversation

kaycebasques commented May 28, 2025 • edited by rickeylev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rickeylev left a comment

Choose a reason for hiding this comment

Uh oh!

rickeylev commented Jun 1, 2025

Uh oh!

rickeylev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rickeylev commented Jun 3, 2025

Uh oh!

kaycebasques commented Jun 4, 2025

Uh oh!

rickeylev commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaycebasques commented May 28, 2025 •

edited by rickeylev

Loading