Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli): Prevent Get & Sync from Hanging on Invalid Application Spec #21702

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

almoelda
Copy link
Contributor

@almoelda almoelda commented Jan 29, 2025

Closes #21613

Adding context cancellation to get cmd and cancelling the context before printFinalStatus on sync command

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@almoelda almoelda requested a review from a team as a code owner January 29, 2025 11:03
Copy link

bunnyshell bot commented Jan 29, 2025

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@almoelda almoelda force-pushed the cli-hangs-on-invalid-spec branch from 3057641 to 753d4f8 Compare January 29, 2025 11:04
@almoelda almoelda changed the title Prevent Get & Sync from Hanging on Invalid Application Spec fix(cli): Prevent Get & Sync from Hanging on Invalid Application Spec Jan 29, 2025
@almoelda almoelda requested a review from a team as a code owner January 29, 2025 11:25
almoelda and others added 8 commits January 29, 2025 13:28
…ore printFinalStatus on sync command

Signed-off-by: Almo Elda <[email protected]>
…rgoproj#21433)

Signed-off-by: Jagpreet Singh Tamber <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Co-authored-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Almo Elda <[email protected]>
argoproj#21677)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Almo Elda <[email protected]>
…rgoproj#21676)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Almo Elda <[email protected]>
Signed-off-by: Almo Elda <[email protected]>
@almoelda almoelda force-pushed the cli-hangs-on-invalid-spec branch from 58ef4e8 to ac12366 Compare January 29, 2025 11:28
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 55.60%. Comparing base (e147247) to head (6f13f63).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
cmd/argocd/commands/app.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21702      +/-   ##
==========================================
+ Coverage   55.51%   55.60%   +0.08%     
==========================================
  Files         339      340       +1     
  Lines       57273    57424     +151     
==========================================
+ Hits        31797    31929     +132     
- Misses      22790    22813      +23     
+ Partials     2686     2682       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cancel()
printFinalStatus(app)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printFinalStatus(app)
printFinalStatus(app)

why are you printing after the cancel() has been called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct otherwise the ctx never gets called in the Get function. It is better to give a cancellation signal when the timeout has occured.


appName, appNs := argo.ParseFromQualifiedName(args[0], appNamespace)

if timeout != 0 {
time.AfterFunc(time.Duration(timeout)*time.Second, func() {
fmt.Println("Context cancelled due to timeout")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Println("Context cancelled due to timeout")

I don't think it's necessary.

@@ -337,6 +337,7 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com
refresh bool
hardRefresh bool
output string
timeout int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeout int
timeout uiint

This should be uint.

@@ -462,6 +473,7 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com
},
}
command.Flags().StringVarP(&output, "output", "o", "wide", "Output format. One of: json|yaml|wide|tree")
command.Flags().IntVar(&timeout, "timeout", 15, "Specifies the maximum duration for the operation to complete. The command will terminate if the timeout is exceeded.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command.Flags().IntVar(&timeout, "timeout", 15, "Specifies the maximum duration for the operation to complete. The command will terminate if the timeout is exceeded.")
command.Flags().UintVar(&timeout, "timeout", defaultCheckTimeoutSeconds, "Time out after this many seconds")

Copy link
Member

@nitishfy nitishfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works perfectly. I've left a few comments before we merge this PR. The changes work well for the sync command too even though you only added the timeout flag to the get command. I wonder how and why?

@almoelda
Copy link
Contributor Author

Thanks @nitishfy
I've committed the suggested changes
Regarding sync command, it already has a timeout flag propagated into waitOnApplicationStatus, see below

if timeout != 0 {

@nitishfy
Copy link
Member

Thanks @nitishfy I've committed the suggested changes Regarding sync command, it already has a timeout flag propagated into waitOnApplicationStatus, see below

if timeout != 0 {

why can't you call the waitOnApplicationStatus() in the get command too like the way we are calling for argocd app wait command?

@agaudreault agaudreault self-assigned this Jan 29, 2025
Comment on lines +400 to +404
if timeout != 0 {
time.AfterFunc(time.Duration(timeout)*time.Second, func() {
cancel()
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if timeout != 0 {
time.AfterFunc(time.Duration(timeout)*time.Second, func() {
cancel()
})
}
app, _, err := waitOnApplicationStatus(ctx, acdClient, appName, uint(timeout), watchOpts{operation: true}, nil, output)
errors.CheckError(err)

You don't need all of this. Instead call the waitOnApplicationStatus before making the first GET call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good. I wanted to ask about hardRefresh since waitOnApplicationStatus only do a normal refresh
please correct me if got it wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess (correct me if i'm wrong) if we proceed with waitOnApplicationStatus we would have to flag an hardRefresh somehow. maybe a context key-value or adding a parameter causing us to adjust all callers to this function. let me know what do you think and i'll get on it.

Copy link
Member

@nitishfy nitishfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL now.

@@ -382,17 +383,26 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com
`),

Run: func(c *cobra.Command, args []string) {
ctx := c.Context()
ctx, cancel := context.WithCancel(c.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set up the context after we validate args on line 387?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change the ctx here anymore if we use the waitOnApplicationStatus func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree with you @nitishfy invoking the the wait function looks like the best way
But it will not utilize the hardRefresh

#21702 (comment)

@@ -2577,7 +2588,7 @@ func waitOnApplicationStatus(ctx context.Context, acdClient argocdclient.Client,
if timeout != 0 {
time.AfterFunc(time.Duration(timeout)*time.Second, func() {
_, appClient := acdClient.NewApplicationClientOrDie()
app, err := appClient.Get(ctx, &application.ApplicationQuery{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when it timeouts, this will call cancel() on the context which will in turn close the appEventCh causing the for-range loop below to break. The last call will be _ = printFinalStatus(app).

This would mean that the AfterFunc should make sure that

  1. It is not also calling printFinalStatus
  2. it should set refresh = false to make sure that the last call to printFinalStatus will not refresh the app.
  3. it should call app, err = appClient.Get(ctx, &application.ApplicationQuery to update the app (without refresh) so it is used by printFinalStatus.

I haven't debugged if it is really what the execution does, but it should be testable in a unit test similar to TestWaitOnApplicationStatus_JSON_YAML_WideOutput.

There are also a few other problem with the code like the connection not being closed in the AfterFunc, and potential race conditions with refresh and app that might now require a lock. TBD

cmd/argocd/commands/app.go Outdated Show resolved Hide resolved
@@ -382,17 +383,26 @@ func NewApplicationGetCommand(clientOpts *argocdclient.ClientOptions) *cobra.Com
`),

Run: func(c *cobra.Command, args []string) {
ctx := c.Context()
ctx, cancel := context.WithCancel(c.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immediately call defer cancel()


appName, appNs := argo.ParseFromQualifiedName(args[0], appNamespace)

if timeout != 0 {
time.AfterFunc(time.Duration(timeout)*time.Second, func() {
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic of a app get --refresh --timeout 10 should be to try to get with the refresh for 10 sec, and after 10 second, print a message, then fallback to the normal get and return that.

Co-authored-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: almoelda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

app sync --timeout x and app get --refresh hangs forever if application spec is invalid
7 participants