Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[single-grpc-server][rfc] introduce a heartbeat to proxy server #26777

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Dec 31, 2024

Summary & Motivation

Pre-emptive work before switching the grpc entrypoint of the workspace context from dagster api grpc to dagster code-server start.

Previously, if the calling process of dagster code-server start never explicitly terminated the process, the proxy server could become orphaned. This fixes by introducing a heartbeat mechanism which will automatically shut down the proxy server if a heartbeat hasn't happened in X amount of time.

How I Tested These Changes

Added a test for the termination behavior - terminate the underlying grpc server, and then after the requisite amount of time has elapsed, the proxy server should automatically shut down as well.

@dpeng817 dpeng817 changed the title [single-grpc-server] introduce a heartbeat to proxy server [single-grpc-server][rfc] introduce a heartbeat to proxy server Dec 31, 2024
@dpeng817 dpeng817 marked this pull request as ready for review December 31, 2024 21:07
Copy link
Contributor Author

dpeng817 commented Dec 31, 2024

@dpeng817 dpeng817 force-pushed the dpeng817/proxy_servicer_heartbeat branch 2 times, most recently from 872d6a0 to 55bc33f Compare January 2, 2025 20:23
@dpeng817 dpeng817 requested a review from gibsondan January 2, 2025 20:24
@@ -121,22 +131,22 @@ def _reload_location(self):
self._logger.exception("Failure while loading code")

if self._client:
self._heartbeat_shutdown_event = threading.Event()
self._heartbeat_thread = threading.Thread(
self._client_heartbeat_shutdown_event = threading.Event()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename these properties because now it's confusing that there's a client and server heartbeat thread

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense but there is one important change to the default behavior here that I think explains the test failures

Comment on lines 149 to 163
@click.option(
"--server-heartbeat-timeout",
type=click.INT,
required=False,
default=DEFAULT_SERVER_HEARTBEAT_TIMEOUT,
help="How long to wait for a heartbeat from the caller before timing out. Defaults to 30 seconds.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should default to off (like dagster api grpc does) - we still want to support the existing use case of users running their own grpc servers, not dagster dev managing it, in which case there will not be a heartbeat - changing the default behavior here would break that use case

name="grpc-server-heartbeat",
daemon=True,
)
self.__server_heartbeat_thread.start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should still join() this thread on cleanup / shutdown (even though its a daemon thread, still good to do cleanup)

@@ -156,13 +166,13 @@ def cleanup(self):
# In case ShutdownServer was not called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up the server one here too

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to keep the separate boolean argument (it will typically only be set by us anyway in dagster dev) or make the default 0 and have that mean no heartbeat

@dpeng817 dpeng817 force-pushed the dpeng817/proxy_servicer_heartbeat branch from 55bc33f to 923951e Compare January 3, 2025 19:59
@gibsondan gibsondan self-requested a review January 3, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants