Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: elide invalidations if no tiles sent ... #11254

Open
mmeeks opened this issue Feb 28, 2025 · 13 comments
Open

perf: elide invalidations if no tiles sent ... #11254

mmeeks opened this issue Feb 28, 2025 · 13 comments
Assignees
Labels
24.04 enhancement New feature or request performance Improving COOL performance

Comments

@mmeeks
Copy link
Contributor

mmeeks commented Feb 28, 2025

If we have not sent any tiles - then we should not need to send invalidations. That can be implemented a boolean - but it's a bit disappointing too.

We should generalize that - and this should be per-view (since co-ordinate systems differ between views). So:

  • when there is a clean-start, or an EMPTY invalidation - we should reset a bbox of all tiles we've sent.
  • whenever we send a new tile we should grow that bbox
  • whenever we get an invalidate we should crop it against that bbox.

This way as we idly layout a very long document we don't create a ton of websocket / coolwsd protocol thrash per-view for all the writer frames as we lay them out idly.

Thanks =)

@mmeeks mmeeks added 24.04 enhancement New feature or request unconfirmed labels Feb 28, 2025
@mmeeks mmeeks added performance Improving COOL performance and removed unconfirmed labels Feb 28, 2025
@mmeeks
Copy link
Contributor Author

mmeeks commented Feb 28, 2025

This got slightly worse recently, with this patch:

diff --git a/wsd/ClientSession.cpp b/wsd/ClientSession.cpp
index e3a62070a5..827abb448a 100644
--- a/wsd/ClientSession.cpp
+++ b/wsd/ClientSession.cpp
@@ -1595,7 +1595,7 @@ bool ClientSession::loadDocument(const char* /*buffer*/, int /*length*/,
 
         if (!getInitialClientVisibleArea().empty())
         {
-            oss << " clientvisiblearea=" << getInitialClientVisibleArea();
+//            oss << " clientvisiblearea=" << getInitialClientVisibleArea();
         }
 
         if (ConfigUtil::getConfigValue<bool>("accessibility.enable", false))

I get a different behavior - but it still looks very much as if we have lost much of our event de-duplication; we have:

14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=193776 width=11905 height=2491 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.964 global.js:575 1740754715964 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0
14:58:35.965 global.js:575 1740754715965 INCOMING: invalidatetiles: part=0 mode=0 x=284 y=196268 width=11905 height=2155 wid=0

for example in the logs. Without the above patch I get:

14:49:06.270 global.js:575 1740754146270 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.270 global.js:575 1740754146270 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.271 global.js:575 1740754146271 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.271 global.js:575 1740754146271 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.271 global.js:575 1740754146271 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.272 global.js:575 1740754146272 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.273 global.js:575 1740754146273 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.274 global.js:575 1740754146274 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.275 global.js:575 1740754146275 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.275 global.js:575 1740754146275 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0
14:49:06.277 global.js:575 1740754146277 INCOMING: invalidatetiles: EMPTY, 0, 0 wid=0

Quite possibly this causes a lot of excess tile rendering too (?) so may be of interest to @caolanm

I assume this is a result of our mainloop changes. [ all of the above from starting writer-edit.fodt in Chrome ]

Can you take a look Miklos ?

@mmeeks
Copy link
Contributor Author

mmeeks commented Feb 28, 2025

do you think you could take a detour to look at that? - IIRC we do event de-duplication and then consolidated emission at some stage - though not entirely clear how that intersects with how the loop spins; perhaps invalidation consolidation & emission from core -> kit needs to be as a lower priority idle handler [ or insert explanation ;-] - I'm not sure my bbox trick will fix all issues there.

@vmiklos
Copy link
Contributor

vmiklos commented Feb 28, 2025

One problematic part is that I'm not sure what's our most important goal. One goal is to have interactivity: so as soon as we have an invalidate, that should result in a paintTile() and should be sent to the client. An other goal is to minimize websocket traffic by waiting a bit, deduplicate events and then have fewer invalidations. I guess it's tricky to have both at the same time.

I guess I can look at two parts:

  1. Implement the bbox trick to not send events which are obviously pointless.
  2. We can say that the interruption that I added in 500df64 is making things worse (and not better) till we can consider the priority of core.git tasks. If so, I can make the anyInput part of that commit conditional and only do it in the COOL_ANYINPUT_CONSIDER_PRIORITY case.

@mmeeks
Copy link
Contributor Author

mmeeks commented Feb 28, 2025

I suspect that treating invalidations outside all visible areas as being lower priority would give us a chance to aggregate many more of them, and not sacrifice interactivity in any way :-) It is odd though that we get so many whole document invalidations back-to-back - that is strange - I wonder why we'd be doing those at idle? is it possible we're doing too little work in a chunk initially somehow ?

Anyhow - the fix above will hide lots of the worst of this in any case I think :-)

@vmiklos
Copy link
Contributor

vmiklos commented Mar 3, 2025

this should be per-view (since co-ordinate systems differ between views)

This should work, but one catch is that doc_paintPartTile() may not paint the tile on the view where we requested the paint, due to the getAlternativeViewForPaint() optimization (in the Calc/Impress case). So need to make sure we work with the original view, at the start of doc_paintPartTile().

And then perhaps what would work correctly is to implement this in the kit process, on the core side, before we insert the callbacks to the CallbackFlushHandler.

And I imagine we need to also reset the bbox on switching parts (sheet/slide).

@mmeeks
Copy link
Contributor Author

mmeeks commented Mar 3, 2025

Great point on the view confusion; not totally clear what to do about that I guess. Quite probably we should maintain a list of bboxes based on the canonical-view-id and whatever view a tile is rendered against - update the canonical-id-related bbox. Clearly we need to respect parts too there - quite a good idea though - to crop invalidates caused during eg. re-calculation from parts we have not rendered tiles against yet though - makes it a nicer optimization :-)

@vmiklos
Copy link
Contributor

vmiklos commented Mar 5, 2025

https://gerrit.libreoffice.org/c/core/+/182524 is now pending CI to do this. I still want to check locally if it breaks some tests on the COOL side.

@vmiklos
Copy link
Contributor

vmiklos commented Mar 5, 2025

#11272 is the COOL side, I should probably go through those 4 dubious tests and fix them instead of the current "just disable" state, still.

@vmiklos
Copy link
Contributor

vmiklos commented Mar 6, 2025

#11283 restores those 4 lost tests, at the end all of them just expected we get invalidations for non-painted areas, no actual code change here.

@mmeeks
Copy link
Contributor Author

mmeeks commented Mar 6, 2025

I wonder if this can create a chicken & egg problem; if we are not requesting new tiles when the clientvisiblearea changes but instead waiting for invalidations we may have a problem. Perhaps this is the calc typing over the edge of the screen issue. I guess the pre-fetching in DocumentBroker should force rendering of every invalid / non-present tile in the clientvisiblearea as it changes - and push those (if it does not already).

@mmeeks
Copy link
Contributor Author

mmeeks commented Mar 6, 2025

Hmm this is trickier than Ithought =)

https://github.com/CollaboraOnline/online/pull/new/private/mmeeks/wip-tilefetch

might go some way to helping here - but we need to consider mobile with no tile-cache I guess.

Miklos - this is prolly worth reverting in master - at least until 25.04. Thanks !

@vmiklos
Copy link
Contributor

vmiklos commented Mar 7, 2025

My thinking is that I assumed all views invalidate the same areas, but this is not the case in Calc with the scenario of #11289

The other problem I had in mind (that we discussed offline yesterday) is when happens if there are 2 Writer views: Alice types, Bob joins, then Alice leaves -- now we switch from painting on the Alice view to painting to the Bob view, but the paint bbox of Alice is not transferred to Bob when Alice quits.

I would say the solution to both is to update the bbox of all views with the same canonical view id. https://gerrit.libreoffice.org/c/core/+/182611 does this. It also requires no changes on the COOL side, which makes me somewhat optimistic about the new state.

Perhaps have this in and if we see any other problems, then -- as you say -- revert on online.git master & on core.git co-24.04, and only do this on core.git 25.04.

@vmiklos
Copy link
Contributor

vmiklos commented Mar 7, 2025

One more thing, I measured the number of 'invalidatetiles:' messages we send on loading test/samples/writer-large-edit.fodt with an --enable-symbols core.git in a 'make run COOL_WRITER_LARGE=y' session. This would be ideally 0, since no editing happens.

Without the above 2 core changes: 477

With them: 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
24.04 enhancement New feature or request performance Improving COOL performance
Projects
Status: No status
Development

No branches or pull requests

2 participants