Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2 days] Manually create a cluster (no terraform) #5355

Open
Tracked by #5351
GeorgianaElena opened this issue Jan 8, 2025 · 14 comments
Open
Tracked by #5351

[2 days] Manually create a cluster (no terraform) #5355

GeorgianaElena opened this issue Jan 8, 2025 · 14 comments
Assignees

Comments

@GeorgianaElena
Copy link
Member

GeorgianaElena commented Jan 8, 2025

Resources to get us started:

@GeorgianaElena GeorgianaElena changed the title 3. [2d] Manually create a cluster (no terraform) - Est. 5 [2 days] Manually create a cluster (no terraform) Jan 8, 2025
@GeorgianaElena
Copy link
Member Author

@jmunroe, I believe you mentioned that there were some docs about this from the project pythia side 🤔? Could you please share those here so we can have them as a reference when we start working on this on Monday?

@sgibson91
Copy link
Member

I interpreted James' comment as "Project Pythia followed JetStream's docs and they got through it fine", just to give a different interpretation. May be wrong though!

@GeorgianaElena
Copy link
Member Author

@sgibson91 makes sense. My understanding was that they experimented with it documented the whole process 🤷‍♀
Putting the jetstream getting started docs in the top comment to have it as a reference

@jmunroe
Copy link
Contributor

jmunroe commented Jan 10, 2025

My assumption is that these are the correct starting points:

There have been previous uses of kubernetes on JetStream2 such as kubespray (docs) but I had understood those to be fairly manual, non-scalable ways of deploying a cluster.

What my ask of the JetStream2 team has been a 'managed kubernetes service' that we can build on top of. I think OpenStack Magnum and ClusterAPI are some of the enabling technologies used by the JetStream2 team but I am not entirely up to speed on the details.

My primary contact at JetStream has been Julian Pistorius ([email protected]). Julian is already on our 2i2c slack.

@GeorgianaElena
Copy link
Member Author

Current state

@sgibson91 and I started deploying a cluster today to the new allocation that @jmunroe created for us.

The process currently fails with CREATE_FAILED:

status_reason  | Failed to create trustee or trust for Cluster

While investigating this, we realized that we don't have access to run commands such as:

  • openstack user show trustee_domain_admin

  • openstack service list because of lack of credentials

  • ForbiddenException: 403: Client Error for url: https://js2.jetstream-cloud.org:5000/v3/users?name=trustee_domain_admin

  • ForbiddenException: 403: Client Error for url: https://js2.jetstream-cloud.org:5000/v3/services, You are not authorized to perform the requested action: identity:list_services.

What we've tried

We tried to create new application credentials that would be more permissive and give it the Unrestricted dangerous option hoping it will give us the right to run the create cluster command, but it didn't work.

The only available roles for us are the ones in the screenshot below:

Image

From the blog post we are following it looks like the loadbalancer role should fix it but that's not available. Also, given that we're not able to list resources either like services, we might need more than that, such as access to the identity API endpoint.

@jmunroe
Copy link
Contributor

jmunroe commented Jan 15, 2025

I've emailed Julian to seek additional guidance (See https://2i2c.freshdesk.com/a/tickets/2690).

@jmunroe
Copy link
Contributor

jmunroe commented Jan 15, 2025

Potentially relevant Jetstream2 issues:

@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Jan 16, 2025

Thank you @jmunroe! I also found some interesting docs about:
- the openstack identity service https://docs.openstack.org/mitaka/install-guide-obs/common/get_started_identity.html that might help us
- magnum specific identity service (keystone): https://docs.openstack.org/magnum/latest/user/#keystone-authn-and-authz

Nvm, we still need permissions to the identity endpoint!

@GeorgianaElena
Copy link
Member Author

Update:

@sgibson91 and I opened a ticket on JetStream2 at https://jetstream-cloud.org/contact/index.html asking for guidance about this permission error. Confirmation email https://2i2c.freshdesk.com/a/tickets/2691

@jmunroe
Copy link
Contributor

jmunroe commented Jan 16, 2025

I've had success creating a Kubernetes cluster using Magnum following Andrea Zonca's blog post. When I was create the application credentials I did select the 'unrestricted (dangerous)' option. I still can't run all openstack commands (like openstack service list) but it does not seem like those are actual blockers to deploying a scalable kubernetes cluster.

I did need to be patient though. It took 117 minutes for the cluster to be created while in zonca's post he timed it at 9 minutes. Perhaps now that images have been copied over, will it run faster?

I think we should increate the quota on the number of Volume that openstack allows. I'll submit that support ticket to the Jetstream2 team.

@GeorgianaElena Please let me know if you'd like to meet tomorrow so we can make sure you are able to do what appears to work for me. I'll grab an early morning slot on your calendar.

@jmunroe
Copy link
Contributor

jmunroe commented Jan 16, 2025

I requested an increase from 10 Volumes to 30 Volumes through the JS2 help desk.

@jmunroe
Copy link
Contributor

jmunroe commented Jan 16, 2025

Quota for Volumes now set to 30.

I deleted my test k8s_jmunroe cluster and I am trying again to create a cluster called k8s

My hope is that it will be faster than 117 minutes this time but it is currently at 25 minutes and still pending. I'll report back how long it actually takes.

@jmunroe
Copy link
Contributor

jmunroe commented Jan 16, 2025

I'm happy to see that the JS2 support desk was able to respond and take action on my request in <60 minutes!

@jmunroe
Copy link
Contributor

jmunroe commented Jan 17, 2025

Unfortunately, attempt 2 was not successful. The cluster creation was stuck in a 'CREATE_IN_PROGRESS' state. It appears that a control plan node get created persists for about 60 minutes, is killed, then is recreated.

Behind the scenes, my understanding of openstack coe create cluster is that Magnum uses the orchestration service Heat to build the cluster.

Using openstack stack list is supposed to allow us to see the Heat orchestration doing its thing but Magnum first creates its own application credentials and I am guess the stack is actually owned by those credentials and not mine.

I gave a few attempts at adding a private ssh keypair and create security groups to attempt to log in to the control plane to poke around and try and find some logs but I don't think I was setting up the openstack networking correctly. I had a floating IP assigned and the SSH port was open but no luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants