Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(webserver): customize shutdown with new kill option #33379

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Skn0tt
Copy link
Member

@Skn0tt Skn0tt commented Oct 31, 2024

Closes #33377, #18209

There's quite some backstory on this that we should keep in mind:

We tried this in the past, then reverted because we wanted to have it spawn a child process instead of a separate process group PR, which we also reverted because it changed from SIGKILL to SIGTERM and that broke Svelte.

This PR proposes a kill option that allows users customizing this shutdown behaviour.


the below is for previous version of this PR. i'll still keep it here for future reference, so we don't lose this research.

This PR proposes to keep things mostly as they are. The addition is to try sending a SIGINT first, and if it didn't exit after half a second (configurable with shutdownTimeout setting), continue as usual with a SIGKILL.

Putting 500ms as a default timeout is based on two assumptions about common webserver usage:

  1. it's mostly used to start web framework dev servers and docker compose setups, which we can expect to understand SIGINT
  2. most local development environments can shut down in 500ms

Users that fulfil both assumptions will now get proper graceful shutdown. User that don't fulfil the second one won't see a benefit until they configure shutdownTimeout. Most other users will see a slowdown of 500ms, but no other breaking changes. Only folks that have a weird implementation of SIGINT will see a breaking change, which should be fine.

Let me know if you think we should use SIGTERM instead of SIGINT. I'm unsure which one is more idiomatic, since Playwright is a user-controlled tool, but we also initiate the signal programatically.

We could also solve some of the docker compose cleanup issues by killing only the top process, and not the entire process group (see here). This would help this specific docker compose thing, but not docker run --rm or any other processes that depend on SIGINT for cleanup.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@Skn0tt Skn0tt changed the title feat(webserver): send graceful SIGINT before killing feat(webserver): customize shutdown with new kill option Nov 7, 2024
@Skn0tt Skn0tt requested a review from dgozman November 7, 2024 08:59

This comment has been minimized.

@@ -619,6 +619,7 @@ export default defineConfig({
- `stdout` ?<["pipe"|"ignore"]> If `"pipe"`, it will pipe the stdout of the command to the process stdout. If `"ignore"`, it will ignore the stdout of the command. Default to `"ignore"`.
- `stderr` ?<["pipe"|"ignore"]> Whether to pipe the stderr of the command to the process stderr or ignore it. Defaults to `"pipe"`.
- `timeout` ?<[int]> How long to wait for the process to start up and be available in milliseconds. Defaults to 60000.
- `kill` ?<{ SIGINT: number } | { SIGTERM: number }> How to shut down the process gracefully. If unspecified, the process group is forcefully `SIGKILL`ed. If set to `{ SIGINT: 500 }`, the top process is sent a `SIGINT` signal, followed by `SIGKILL` if it doesn't exit within 500ms. You can also use `SIGTERM` instead. A `0` timeout means no `SIGKILL` will be sent. Windows doesn't support `SIGINT` and `SIGTERM` signals, so this option is ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this TS notation will not be parsed. Instead, make two separate properties below, like any location or position options we have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the types.d.ts generator understands it, didn't think about that the docs site also needs to parse this, though. updated it.

return line.substring(options.prefix.length);
return line;
})
.filter(line => line.startsWith('%%'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'd think prefix would replace %% in this function, but turns out it does not. I am worried this version will be hard to use in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. i've replaced this with a test-local implementation, outputLines() isn't even available on the result of runInlineTest.

const testProcess = await interactWithTestRunner(files(), { workers: 1 });

await testProcess.waitForOutput('webserver started');
process.kill(testProcess.process.pid!, 'SIGINT');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do you have to interact with the test runner? Shouldn't it send SIGINT/SIGTERM/SIGKILL during the normal testing operation without user interrupt? If so, I'd think that's more important to test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why I did this. replaced it with bog-standard runInlineTest

await processExit;
} else {
await Promise.race([
processExit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If process exits before the timer runs out, shouldn't we clear the timer? If we don't, it would probably block Node.js from exiting for the full duration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, yeah! I've refactored this a tad bit to make it easier on the eye. Noticed that node:timers has some useful tools around this though, particular ref and signal: https://nodejs.org/api/timers.html#timerspromisessettimeoutdelay-value-options
Might be handy somewhere else.

packages/playwright/src/plugins/webServerPlugin.ts Outdated Show resolved Hide resolved

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Member Author

@Skn0tt Skn0tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed the feedback

return line.substring(options.prefix.length);
return line;
})
.filter(line => line.startsWith('%%'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. i've replaced this with a test-local implementation, outputLines() isn't even available on the result of runInlineTest.

const testProcess = await interactWithTestRunner(files(), { workers: 1 });

await testProcess.waitForOutput('webserver started');
process.kill(testProcess.process.pid!, 'SIGINT');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why I did this. replaced it with bog-standard runInlineTest

await processExit;
} else {
await Promise.race([
processExit,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, yeah! I've refactored this a tad bit to make it easier on the eye. Noticed that node:timers has some useful tools around this though, particular ref and signal: https://nodejs.org/api/timers.html#timerspromisessettimeoutdelay-value-options
Might be handy somewhere else.

docs/src/test-api/class-testconfig.md Outdated Show resolved Hide resolved
const timer = timeout !== 0
? setTimeout(() => {
// @ts-expect-error. SIGINT didn't kill the process, but `processLauncher` will only attempt killing it if this is false
launchedProcess.killed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty hacky to override the process killed property - maybe we can refactor it to avoid that?
Like we don't know what other implications it might end up with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree. It might be fine today, but who knows how node:child_process changes. I've refactored it in 220ed29, how do you like that?

Co-authored-by: Dmitry Gozman <[email protected]>
Signed-off-by: Simon Knott <[email protected]>

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@Skn0tt
Copy link
Member Author

Skn0tt commented Nov 7, 2024

It appears that the tests fail on linux because the webserver process isn't properly cleaned up there, and a following test has a port clash. I'm checking what we can do.

This comment has been minimized.

@Skn0tt
Copy link
Member Author

Skn0tt commented Nov 7, 2024

Uh-oh. It appears that because we set detached: true in processLauncher.ts, we suddenly don't have the PID available anymore. In my testing, the PID in Node was one smaller than the PID I saw in ps. That means we need to send the signal to the entire group, but even that doesn't work well. I don't think this'll make this release.

@Skn0tt
Copy link
Member Author

Skn0tt commented Nov 7, 2024

Alright, the bug was in listening for childProcess#exit, when it should've listened for childProcess#close. I've fixed that.

We still need to send the signal to the entire process group now, which feels off. The alternative would be to disable detached for the webserver, but that would be a breaking change.

@@ -226,7 +226,7 @@ export async function launchProcess(options: LaunchProcessOptions): Promise<Laun
killSet.delete(killProcessAndCleanup);
removeProcessHandlersIfNeeded();
options.log(`[pid=${spawnedProcess.pid}] <kill>`);
if (spawnedProcess.pid && !spawnedProcess.killed && !processClosed) {
Copy link
Member Author

@Skn0tt Skn0tt Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.killed means that the process was sent a signal before, not that it was killed. processClosed is better for that, especially now that it's set by the close event and not the exit event. exit is fired whenever the process receives a signal, close only after the process exited. some confusing terminology there.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Nov 7, 2024

Test results for "tests 1"

3 failed
❌ [playwright-test] › playwright.trace.spec.ts:137:5 › should not throw with trace: on-first-retry and two retries in the same worker @macos-latest-node18-1
❌ [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT @macos-latest-node18-1
❌ [webkit-page] › page/page-leaks.spec.ts:107:5 › fill should not leak @webkit-ubuntu-22.04-node18

1 flaky ⚠️ [installation tests] › playwright-electron-should-work.spec.ts:44:5 › should work when wrapped inside @playwright/test and trace is enabled @package-installations-macos-latest

36865 passed, 682 skipped
✔️✔️✔️

Merge workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Graceful shutdown of webservers
3 participants