-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cli): Prevent Get & Sync from Hanging on Invalid Application Spec #21702
base: master
Are you sure you want to change the base?
Conversation
🔴 Preview Environment stopped on BunnyshellSee: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
3057641
to
753d4f8
Compare
…ore printFinalStatus on sync command Signed-off-by: Almo Elda <[email protected]>
…rgoproj#21433) Signed-off-by: Jagpreet Singh Tamber <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> Co-authored-by: Alexandre Gaudreault <[email protected]> Signed-off-by: Almo Elda <[email protected]>
…rgoproj#21698) Signed-off-by: Keith Chong <[email protected]> Signed-off-by: Almo Elda <[email protected]>
argoproj#21677) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Almo Elda <[email protected]>
…rgoproj#21676) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Almo Elda <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]> Signed-off-by: Almo Elda <[email protected]>
Signed-off-by: Almo Elda <[email protected]>
Signed-off-by: Almo Elda <[email protected]>
58ef4e8
to
ac12366
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21702 +/- ##
==========================================
+ Coverage 55.51% 55.60% +0.08%
==========================================
Files 339 340 +1
Lines 57273 57424 +151
==========================================
+ Hits 31797 31929 +132
- Misses 22790 22813 +23
+ Partials 2686 2682 -4 ☔ View full report in Codecov by Sentry. |
…lid-spec Signed-off-by: Almo Elda <[email protected]>
cmd/argocd/commands/app.go
Outdated
cancel() | ||
printFinalStatus(app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printFinalStatus(app) | |
printFinalStatus(app) |
why are you printing after the cancel()
has been called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct otherwise the ctx never gets called in the Get function. It is better to give a cancellation signal when the timeout has occured.
cmd/argocd/commands/app.go
Outdated
|
||
appName, appNs := argo.ParseFromQualifiedName(args[0], appNamespace) | ||
|
||
if timeout != 0 { | ||
time.AfterFunc(time.Duration(timeout)*time.Second, func() { | ||
fmt.Println("Context cancelled due to timeout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt.Println("Context cancelled due to timeout") |
I don't think it's necessary.
cmd/argocd/commands/app.go
Outdated
@@ -337,6 +337,7 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com | |||
refresh bool | |||
hardRefresh bool | |||
output string | |||
timeout int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout int | |
timeout uiint |
This should be uint
.
cmd/argocd/commands/app.go
Outdated
@@ -462,6 +473,7 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com | |||
}, | |||
} | |||
command.Flags().StringVarP(&output, "output", "o", "wide", "Output format. One of: json|yaml|wide|tree") | |||
command.Flags().IntVar(&timeout, "timeout", 15, "Specifies the maximum duration for the operation to complete. The command will terminate if the timeout is exceeded.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
command.Flags().IntVar(&timeout, "timeout", 15, "Specifies the maximum duration for the operation to complete. The command will terminate if the timeout is exceeded.") | |
command.Flags().UintVar(&timeout, "timeout", defaultCheckTimeoutSeconds, "Time out after this many seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works perfectly. I've left a few comments before we merge this PR. The changes work well for the sync command too even though you only added the timeout
flag to the get command. I wonder how and why?
Signed-off-by: Almo Elda <[email protected]>
…n also Signed-off-by: Almo Elda <[email protected]>
…lid-spec Signed-off-by: Almo Elda <[email protected]>
Thanks @nitishfy argo-cd/cmd/argocd/commands/app.go Line 2588 in 8449bab
|
why can't you call the waitOnApplicationStatus() in the get command too like the way we are calling for |
if timeout != 0 { | ||
time.AfterFunc(time.Duration(timeout)*time.Second, func() { | ||
cancel() | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if timeout != 0 { | |
time.AfterFunc(time.Duration(timeout)*time.Second, func() { | |
cancel() | |
}) | |
} | |
app, _, err := waitOnApplicationStatus(ctx, acdClient, appName, uint(timeout), watchOpts{operation: true}, nil, output) | |
errors.CheckError(err) |
You don't need all of this. Instead call the waitOnApplicationStatus
before making the first GET call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good. I wanted to ask about hardRefresh
since waitOnApplicationStatus only do a normal refresh
please correct me if got it wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess (correct me if i'm wrong) if we proceed with waitOnApplicationStatus
we would have to flag an hardRefresh
somehow. maybe a context key-value or adding a parameter causing us to adjust all callers to this function. let me know what do you think and i'll get on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL now.
@@ -382,17 +383,26 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com | |||
`), | |||
|
|||
Run: func(c *cobra.Command, args []string) { | |||
ctx := c.Context() | |||
ctx, cancel := context.WithCancel(c.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set up the context after we validate args on line 387?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to change the ctx here anymore if we use the waitOnApplicationStatus
func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree with you @nitishfy invoking the the wait function looks like the best way
But it will not utilize the hardRefresh
@@ -2577,7 +2588,7 @@ func waitOnApplicationStatus(ctx context.Context, acdClient argocdclient.Client, | |||
if timeout != 0 { | |||
time.AfterFunc(time.Duration(timeout)*time.Second, func() { | |||
_, appClient := acdClient.NewApplicationClientOrDie() | |||
app, err := appClient.Get(ctx, &application.ApplicationQuery{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when it timeouts, this will call cancel()
on the context which will in turn close the appEventCh
causing the for-range loop below to break. The last call will be _ = printFinalStatus(app)
.
This would mean that the AfterFunc should make sure that
- It is not also calling
printFinalStatus
✅ - it should set
refresh = false
to make sure that the last call toprintFinalStatus
will not refresh the app. - it should call
app, err = appClient.Get(ctx, &application.ApplicationQuery
to update theapp
(without refresh) so it is used by printFinalStatus.
I haven't debugged if it is really what the execution does, but it should be testable in a unit test similar to TestWaitOnApplicationStatus_JSON_YAML_WideOutput
.
There are also a few other problem with the code like the connection not being closed in the AfterFunc, and potential race conditions with refresh and app that might now require a lock. TBD
@@ -382,17 +383,26 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com | |||
`), | |||
|
|||
Run: func(c *cobra.Command, args []string) { | |||
ctx := c.Context() | |||
ctx, cancel := context.WithCancel(c.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immediately call defer cancel()
|
||
appName, appNs := argo.ParseFromQualifiedName(args[0], appNamespace) | ||
|
||
if timeout != 0 { | ||
time.AfterFunc(time.Duration(timeout)*time.Second, func() { | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic of a app get --refresh --timeout 10
should be to try to get with the refresh for 10 sec, and after 10 second, print a message, then fallback to the normal get and return that.
Co-authored-by: Alexandre Gaudreault <[email protected]> Signed-off-by: almoelda <[email protected]>
Closes #21613
Adding context cancellation to get cmd and cancelling the context before printFinalStatus on sync command
Checklist: