Architectural Refactoring #142
Replies: 3 comments
-
As we discussed over Slack, it would be good to abstract away the CDP event loop in a separate package, so that we can start cleaning up I don't think we can move to a single event loop, as we need an internal one for navigation and other events, but extracting the CDP one should be straightforward, conceptually at least if not practically 😅 I'll let Robin assign the priority for this :) I think we should fix the currently open major issues and tackle this ASAP after that and before implementing new features, since it will help us troubleshoot any other issues more easily. |
Beta Was this translation helpful? Give feedback.
-
I support any effort to make the code more easily debuggable and long-term maintainable 🙂 Priority wise I'd put this somewhere in the second half of the prio 1 issues...but I still have a bunch of prio 1 issues to create in regards to tests and docs and I'd like to see that we implement some of those before embarking on bigger changes to the internals (and it should help having more integration and end-to-end test coverage before making these kinds of changes as well). On a related note, I've also many times wished there was a way to separate out a "trace" of log messages belonging to a particular JS API call (eg. |
Beta Was this translation helpful? Give feedback.
-
To reopen this discussion, after working on #281 it became clear that the current way we emit and process internal events is inherently racy with CDP events. Sometimes we don't process events quickly enough (due to internal locking or unreliable timing on CI machines), or fail to setup waiters for events, leading to timeouts. The system is also very difficult to reason about, as events could be emitted from anywhere in any order, and they sometimes cause complex internal state changes, which coupled with CDP events makes it difficult to troubleshoot issues. So I'd like to propose removing this internal event-based system in favor of an event-driven finite-state machine. The events driving the FSM will be the CDP events we have to handle, but the internal state of frames and other entities would be managed by FSMs. This would have at least a couple benefits:
If we break it down into packages, it could look something like this: graph LR
script([script]) --> js --> state <--> cdp <--> browser([browser])
state <--> frame
subgraph lib
frame
page
...
end
(The If we can avoid any circular imports, this would break up the current The idea is that state changes will come from either the script or CDP, but internally it should always be clear what the current state of the system is.
The FSM could either be built from scratch, or there are several libs we could choose from, e.g. https://github.com/looplab/fsm. Additionally we might want to drop our |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about this today:
So we can easily manage the chain of events in a single place and fix the problems faster and robustly. I believe, doing so will simplify the codebase and allow us to become more productive in time.
My reasoning is the current design makes debugging challenging and creates hard to grok synchronization issues. I know we follow Playwright code structure but we might want a different approach since we are using Go instead of Javascript. It's because each has its own style when architecting code.
You can't easily follow the chain of events:
Frame
manages its own lifecycle events and deals with its mainframe's and child frames' events.For example, when you want to debug something:
Here's an experimental project. It's only for experimenting with CDP, and learning about how it works. I put it here only for reference.
Beta Was this translation helpful? Give feedback.
All reactions