New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CML Kubernetes self-hosted runner is registered to GitHub but the workflow never continues #1415
Comments
@ludelafo I think the issue is that the executing command The two solutions I see are:
|
Hi @dacbd, thank you for your input. I'll investigate more on my side to check if I can fix the issue. What questions me is that I remember to have the same set up previously and it worked out of the box. I'll get back to you if I find something. |
Hello @dacbd, After a few months working on other projects, I'm back on CML/MLOps principles. After updating all packages to check if this issue is resolved, my team and I are still having troubles to use CML with Kubernetes and GitHub Actions. In order to try to identify the problem, I created a minimal reproducible example that you can find here: https://github.com/swiss-ai-center/cml-kubernetes-github-actions-runner-minimal-reproducible-example. It contains all the steps to reproduce the issue and open questions for more investigating. We are three people looking into this issue and weren't able to find a solution. I'll tag them (@rmarquis, @leonardcser) so they can intervene in the conversation if necessary. We are highly motivated to help Iterative fix this issue, so please let us know how we can help! Thanks in advance, |
@ludelafo I'm sorry I dont have much capacity to help you, and I'm not sure how busy @0x2b3bfa0 is. A few things I would recommend: inspecting to cluster to make sure the pod is even being created also going into your gcp logs explorer and inspecting the API calls/activity to make sure nothing is being denied or missing. CML generates a ssh key that is used for the instance. You can run the command locally using your own ssh key (there should be a few examples in the docs) and then try and ssh into it your self and inspect the contents for errors. (CML does it's readiness check via ssh) |
Hi CML team,
I'm facing an issue with CML when creating a self-hosted runner for GitHub on a Google Cloud Kubernetes cluster.
The runner is created and seems to register to GitHub. However, the workflow never continues and hangs on
I'm using the following steps to create the runner:
repo
scope.CML_PAT
.GCP_SERVICE_ACCOUNT_KEY
.Here are some logs that might help you:
Logs of the runner just after the start
Logs of the runner after some time
Logs of the GitHub workflow
I was able to check if the runner was successfully able to register to GitHub by running the following command (from the GitHub API documentation):
Output of the cURL command
You can find a repository with the code used to reproduce this issue here.
I created two workflows to test the runner:
workflow-from-actions.yml
using CML official GitHub Actionsworkflow-from-sources.yml
using CML and TPI from sourcesYou can find the execution of the two workflows here and here.
I did try all sorts of things to try to make it work, but I was not able to find a solution. I tried to:
repo
scopepermissions
to the GitHub workflow file0.18.x
)--cloud-image="iterativeai/cml:0-dvc3-base1-gpu"
,--tpi-version="= 0.11.18"
and--cml-version="0.19.0"
arguments to set older versions of CML and TPIPlease let me know if I can be of any help and thank you!
The text was updated successfully, but these errors were encountered: