Skip to content

fix: flyteadmin doesn't shutdown servers gracefully #6289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions flyteadmin/pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@ type ServerConfig struct {
// Deprecated: please use auth.AppAuth.ThirdPartyConfig instead.
DeprecatedThirdPartyConfig authConfig.ThirdPartyConfigOptions `json:"thirdPartyConfig" pflag:",Deprecated please use auth.appAuth.thirdPartyConfig instead."`

DataProxy DataProxyConfig `json:"dataProxy" pflag:",Defines data proxy configuration."`
ReadHeaderTimeoutSeconds int `json:"readHeaderTimeoutSeconds" pflag:",The amount of time allowed to read request headers."`
KubeClientConfig KubeClientConfig `json:"kubeClientConfig" pflag:",Configuration to control the Kubernetes client"`
DataProxy DataProxyConfig `json:"dataProxy" pflag:",Defines data proxy configuration."`
ReadHeaderTimeoutSeconds int `json:"readHeaderTimeoutSeconds" pflag:",The amount of time allowed to read request headers."`
KubeClientConfig KubeClientConfig `json:"kubeClientConfig" pflag:",Configuration to control the Kubernetes client"`
GracefulShutdownTimeoutSeconds int `json:"gracefulShutdownTimeoutSeconds" pflag:",Number of seconds to wait for graceful shutdown before forcefully terminating the server"`
}

type DataProxyConfig struct {
Expand Down Expand Up @@ -119,6 +120,7 @@ var defaultServerConfig = &ServerConfig{
Burst: 25,
Timeout: config.Duration{Duration: 30 * time.Second},
},
GracefulShutdownTimeoutSeconds: 10,
}
var serverConfig = config.MustRegisterSection(SectionKey, defaultServerConfig)

Expand Down
1 change: 1 addition & 0 deletions flyteadmin/pkg/config/serverconfig_flags.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions flyteadmin/pkg/config/serverconfig_flags_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 52 additions & 8 deletions flyteadmin/pkg/server/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@
"fmt"
"net"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"

"github.com/gorilla/handlers"
Expand Down Expand Up @@ -386,8 +389,9 @@
}

go func() {
err := grpcServer.Serve(lis)
logger.Fatalf(ctx, "Failed to create GRPC Server, Err: ", err)
if err := grpcServer.Serve(lis); err != nil {
logger.Fatalf(ctx, "Failed to create GRPC Server, Err: %v", err)
}

Check warning on line 394 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L392-L394

Added lines #L392 - L394 were not covered by tests
}()

logger.Infof(ctx, "Starting HTTP/1 Gateway server on %s", cfg.GetHostAddress())
Expand Down Expand Up @@ -422,11 +426,35 @@
ReadHeaderTimeout: time.Duration(cfg.ReadHeaderTimeoutSeconds) * time.Second,
}

err = server.ListenAndServe()
if err != nil {
return errors.Wrapf(err, "failed to Start HTTP Server")
go func() {
err = server.ListenAndServe()
if err != nil && err != http.ErrServerClosed {
logger.Fatalf(ctx, "Failed to start HTTP Server: %v", err)
}

Check warning on line 433 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L429-L433

Added lines #L429 - L433 were not covered by tests
}()

// Gracefully shut down the servers
sigCh := make(chan os.Signal, 1)
Copy link
Contributor

@Sovietaced Sovietaced Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have two signal listeners? Should just be able to use one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean just SIGINT or SIGTERM?

Copy link
Contributor

@Sovietaced Sovietaced Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that in this diff you have two signal channels (one for grpc and one for http). You only need one to know whether or not the app is shutting down.

signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
<-sigCh

// force to shut down servers after 10 seconds
logger.Infof(ctx, "Shutting down server... timeout: %d seconds", cfg.GracefulShutdownTimeoutSeconds)
shutdownTimeout := cfg.GracefulShutdownTimeoutSeconds
timer := time.AfterFunc(time.Duration(shutdownTimeout)*time.Second, func() {
logger.Infof(ctx, "Server couldn't stop gracefully in time. Doing force stop.")
server.Close()
grpcServer.Stop()
})
defer timer.Stop()

grpcServer.GracefulStop()

if err := server.Shutdown(ctx); err != nil {
logger.Errorf(ctx, "Failed to gracefully shutdown HTTP server: %v", err)

Check warning on line 454 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L437-L454

Added lines #L437 - L454 were not covered by tests
}

logger.Infof(ctx, "Servers gracefully stopped")

Check warning on line 457 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L457

Added line #L457 was not covered by tests
return nil
}

Expand Down Expand Up @@ -534,10 +562,26 @@
ReadHeaderTimeout: time.Duration(cfg.ReadHeaderTimeoutSeconds) * time.Second,
}

err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
go func() {
err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
if err != nil && err != http.ErrServerClosed {
logger.Errorf(ctx, "Failed to start HTTP/2 Server: %v", err)
}

Check warning on line 569 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L565-L569

Added lines #L565 - L569 were not covered by tests
}()
Comment on lines +565 to +570
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition in server startup

The HTTP/2 server is now started in a goroutine, but the function immediately proceeds to wait for shutdown signals. This could lead to a race condition where the shutdown sequence begins before the server is fully initialized. Consider adding a small delay or a readiness check before proceeding to the shutdown logic.

Code suggestion
Check the AI-generated fix before applying
 @@ -558,6 +558,13 @@
  	go func() {
  		err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
  		if err != nil && err != http.ErrServerClosed {
  			logger.Errorf(ctx, "Failed to start HTTP/2 Server: %v", err)
  		}
  	}()
 +
 +	// Give the server a moment to start before proceeding to shutdown logic
 +	time.Sleep(100 * time.Millisecond)
 +
 +	// Log that the server has started
 +	logger.Infof(ctx, "HTTP/2 Server started successfully on %s", cfg.GetHostAddress())
 

Code Review Run #55c786


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them


// Gracefully shutdown the servers
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
<-sigCh

Check warning on line 575 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L573-L575

Added lines #L573 - L575 were not covered by tests

if err != nil {
return errors.Wrapf(err, "failed to Start HTTP/2 Server")
// Create a context with timeout for the shutdown process
shutdownCtx, cancel := context.WithTimeout(context.Background(), time.Duration(cfg.GracefulShutdownTimeoutSeconds)*time.Second)
defer cancel()

if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Errorf(ctx, "Failed to shutdown HTTP server: %v", err)

Check warning on line 582 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L577-L582

Added lines #L577 - L582 were not covered by tests
}

logger.Infof(ctx, "Servers gracefully stopped")

Check warning on line 585 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L585

Added line #L585 was not covered by tests
return nil
}
Loading