Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubectl ray cluster log command #2296

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

chiayi
Copy link
Contributor

@chiayi chiayi commented Aug 9, 2024

Why are these changes needed?

This PR adds the kubectl ray cluster logs command to retrieve the head node logs for a ray cluster.

Screenshot 2024-08-09 at 10 11 29 AM Screenshot 2024-08-09 at 10 11 42 AM

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@chiayi chiayi force-pushed the kubectl-plugin branch 5 times, most recently from 88d5ad2 to dc38dc8 Compare August 9, 2024 17:38
@chiayi chiayi force-pushed the kubectl-plugin branch 2 times, most recently from b47dd3a to d056060 Compare August 30, 2024 22:04
@chiayi chiayi marked this pull request as ready for review August 30, 2024 22:27
kubectl-plugin/README.md Outdated Show resolved Hide resolved
kubectl-plugin/README.md Outdated Show resolved Hide resolved
kubectl-plugin/pkg/cmd/cluster/log.go Outdated Show resolved Hide resolved
}

// we validate the directory
info, err := os.Stat(options.outputDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forget to handle the case that --out-dir is not provided by the user. It should be handled in the Complete function above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the --out-dir flag is not set by the user, it will fail the proceeding validation giving a no such file or directory error. I don't think thats a good error msg for the user so I will be adding another check that returns something along the lines of "No output directory, please set --out-dir flag". As for handling this case in the Complete function, I think we should keep all the validation should be kept in the Validation function. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrow the idea from kubectl source code.

https://github.com/kubernetes/kubectl/blob/b2a7d2626dc56b2edf952ab4031fcc20a7f441e4/pkg/cmd/logs/logs.go#L233-L236

Here the Pod name is required and it is handled in the Complete function instead of Validate function.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, --out-dir flag will not be required and will default to current directory with the head name being the destination directory name.

With that, I think it make sense to do the validation check in the Validation method. It seems like the check in the kubectl source code is checking for completion? WDYT?

@andrewsykim
Copy link
Collaborator

@chiayi can you share the output of running the kubectl ray cluster logs command?

@chiayi
Copy link
Contributor Author

chiayi commented Sep 3, 2024

Screenshot 2024-09-03 at 11 01 43 AM

This is what it looks like when it downloads the file. Please let me know if I should add more logs.

Result:
Screenshot 2024-09-03 at 11 37 01 AM

@andrewsykim
Copy link
Collaborator

@chiayi in your last screenshot, why is there two directories named <pod-name> and <pod-name>-logs?

@chiayi
Copy link
Contributor Author

chiayi commented Sep 3, 2024

One of the them (stable-diffusion-cluster-head-hqcxt) isn't a directory and is the pod logs. and another one is the directory that contains all the files from /tmp/ray/session_latest/logs/.

@chiayi
Copy link
Contributor Author

chiayi commented Sep 3, 2024

I'm going to rename the pod log one to include pod-logs. Let me know if there is something else I should as well. @andrewsykim

@andrewsykim
Copy link
Collaborator

I suggest having just one directory per pod named after the pod name and just add a stdout.log file in there

@chiayi
Copy link
Contributor Author

chiayi commented Sep 3, 2024

New output directory format:

Screenshot 2024-09-03 at 2 38 50 PM

kubectl-plugin/README.md Show resolved Hide resolved
kubectl-plugin/pkg/cmd/cluster/log.go Outdated Show resolved Hide resolved
kubectl-plugin/pkg/cmd/cluster/log.go Outdated Show resolved Hide resolved
kubectl-plugin/pkg/cmd/cluster/log.go Outdated Show resolved Hide resolved
@chiayi
Copy link
Contributor Author

chiayi commented Sep 6, 2024

Current flow:
Screenshot 2024-09-06 at 2 43 09 PM
Screenshot 2024-09-06 at 2 43 49 PM

Out-dir is no longer required and will download to current dir if not specified.
Syntax changed to kubectl ray logs CLUSTERNAME

@andrewsykim
Copy link
Collaborator

andrewsykim commented Sep 9, 2024

Out-dir is no longer required and will download to current dir if not specified.

Instead of defaulting to current-dir, can we default to ./<cluster-name> ?

Otherwise LGTM!

@chiayi
Copy link
Contributor Author

chiayi commented Sep 9, 2024

Screenshot 2024-09-09 at 12 01 39 PM

@chiayi chiayi force-pushed the kubectl-plugin branch 4 times, most recently from 3387eeb to 258413e Compare September 9, 2024 19:57
@andrewsykim
Copy link
Collaborator

FYI, the lint failure is fixed in #2366

Copy link
Collaborator

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chiayi chiayi force-pushed the kubectl-plugin branch 2 times, most recently from fea4171 to 781e3e4 Compare September 9, 2024 21:22
@kevin85421
Copy link
Member

@chiayi would you mind fixing the linter issue? Thanks!

@kevin85421 kevin85421 self-assigned this Sep 10, 2024
@chiayi chiayi force-pushed the kubectl-plugin branch 2 times, most recently from dc2033a to ba53021 Compare September 10, 2024 17:36
@chiayi
Copy link
Contributor Author

chiayi commented Sep 10, 2024

@kevin85421 I merged in the pervious linter fix, the current linter issues are being fixed in #2368.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge the PR after CI passes because Andrew has already approved the PR.

@kevin85421 kevin85421 merged commit 3e68606 into ray-project:master Sep 10, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants