-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[single-grpc-server][rfc] introduce a heartbeat to proxy server #26777
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
872d6a0
to
55bc33f
Compare
@@ -121,22 +131,22 @@ def _reload_location(self): | |||
self._logger.exception("Failure while loading code") | |||
|
|||
if self._client: | |||
self._heartbeat_shutdown_event = threading.Event() | |||
self._heartbeat_thread = threading.Thread( | |||
self._client_heartbeat_shutdown_event = threading.Event() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename these properties because now it's confusing that there's a client and server heartbeat thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense but there is one important change to the default behavior here that I think explains the test failures
@click.option( | ||
"--server-heartbeat-timeout", | ||
type=click.INT, | ||
required=False, | ||
default=DEFAULT_SERVER_HEARTBEAT_TIMEOUT, | ||
help="How long to wait for a heartbeat from the caller before timing out. Defaults to 30 seconds.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should default to off (like dagster api grpc does) - we still want to support the existing use case of users running their own grpc servers, not dagster dev managing it, in which case there will not be a heartbeat - changing the default behavior here would break that use case
name="grpc-server-heartbeat", | ||
daemon=True, | ||
) | ||
self.__server_heartbeat_thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still join() this thread on cleanup / shutdown (even though its a daemon thread, still good to do cleanup)
@@ -156,13 +166,13 @@ def cleanup(self): | |||
# In case ShutdownServer was not called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up the server one here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may want to keep the separate boolean argument (it will typically only be set by us anyway in dagster dev) or make the default 0 and have that mean no heartbeat
55bc33f
to
923951e
Compare
Summary & Motivation
Pre-emptive work before switching the grpc entrypoint of the workspace context from
dagster api grpc
todagster code-server start
.Previously, if the calling process of
dagster code-server start
never explicitly terminated the process, the proxy server could become orphaned. This fixes by introducing a heartbeat mechanism which will automatically shut down the proxy server if a heartbeat hasn't happened in X amount of time.How I Tested These Changes
Added a test for the termination behavior - terminate the underlying grpc server, and then after the requisite amount of time has elapsed, the proxy server should automatically shut down as well.