Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom node image for utilizing pre-cached image layer #252

Open
HakjunMIN opened this issue Apr 4, 2024 · 8 comments
Open

Custom node image for utilizing pre-cached image layer #252

HakjunMIN opened this issue Apr 4, 2024 · 8 comments
Labels
area/vm-images Issues or PRs related to VM images or image galleries kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@HakjunMIN
Copy link

Tell us about your request

Currently it looks only predefined image can be supported on nodeclaim CRD. Can custom image which has pre cached container image layer be support there?

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Using karpenter, have to use AI/ML images using NVIDIA image. When scaling out by karpenter, faster image pull is necessary using cached node cluster.

Are you currently working around this issue?

Artifact streaming in AKS but this is very slow than local cache. Also local cluster registry can be utilized but it is a burden.

Additional Context

No response

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@HakjunMIN HakjunMIN changed the title Custom node image with pre-loaded image cache. Custom node image for utilizing pre-cached image layer Apr 4, 2024
@tallaxes
Copy link
Collaborator

tallaxes commented Apr 4, 2024

Could you describe the process you use to build custom node image? (Especially interested in whether it is based off an AKS node image, since it affects bootstrap.)

@tallaxes tallaxes added kind/feature Categorizes issue or PR as related to a new feature. area/vm-images Issues or PRs related to VM images or image galleries triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 4, 2024
@HakjunMIN
Copy link
Author

HakjunMIN commented Apr 5, 2024

Known as AKS node image is not able to customize, but would like to do as AWS way such as

I don't have an idea yet to build custom node image, but if there is a Packer or other tools to build AKS node image, want to let it have pre cached ML image.

My goal is to reduce time to pull large image of AI/ML using local cache in environment of Karpenter.

@Bryce-Soghigian
Copy link
Contributor

Bryce-Soghigian commented Apr 5, 2024

The template and scripts used to build aks node images via packer compiling are all open source: https://github.com/Azure/AgentBaker.

That being said step 1 is to enable artifact streaming on karpenter nodes. I have a POC for this just didn't have the time to setup the e2e test as its a bit more involved. #121 was the POC.

Custom Node Image isn't on the immediate plans as we are first making things reliable and stable, but artifact streaming may be a start.

One older project that may be worth mentioning is kamino: https://github.com/jackfrancis/kamino?tab=readme-ov-file
The idea here IIRC is that we follow a prototype pattern. This prototype would have a conceptual "golden node". This golden node would have your cached images, then we snapshot that node, and use that node image for all of your nodes. This "golden node image" would have the things you need cached on the node.

When we do tackle something like this I imagine we will go into a direction like that so that the node image you are using has everything we need on the aks side and isn't doing too much but you still get that cache performance improvement.

@HakjunMIN
Copy link
Author

@Bryce-Soghigian Thank you much. As you guided will try artifact streaming first then move to kamino. I believe kamino can be worked well with karpenter as well. Certainly I'll test it.

@Bryce-Soghigian
Copy link
Contributor

Kamino will not work with karpenter in the projects current state for a couple of reasons.

  1. Karpenter currently pulls from Community Image Galleries, and doesn't use sig images.
  2. Kamino operates on the vmss datamodel. Karpenter provisions single instance vms rather than leveraging a scale set.
  3. Karpenter has no mechanism to query custom images in its current state.

I will get started on adding artifact streaming support. There is a fair bit of work to do before we can support a kamino style node image cache layer in karpenter.

@HakjunMIN
Copy link
Author

@Bryce-Soghigian Oh. understood. But Artifact Streaming doesn't support Karpenter now? What is approximate ETA to add Artifact Streaming to Karpenter?

@Bryce-Soghigian Bryce-Soghigian added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 15, 2024
@Bryce-Soghigian
Copy link
Contributor

@HakjunMIN I created a separate issue to track the artifact streaming work #266.

Long term, it would be great to do something similar to what you are describing here in Karpenter. We still need to work through many other things first, however. Please subscribe to the artifact streaming issue for further updates there.

@HakjunMIN
Copy link
Author

HakjunMIN commented Apr 25, 2024

@Bryce-Soghigian

Below AWS link is perfect way to implement this. Beside of artifact streming, it would be great that a custom snapshot can be used for node class image. Could you add it to your backlogs?

https://github.com/aws-samples/bottlerocket-images-cache?tab=readme-ov-file#with-karpenter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vm-images Issues or PRs related to VM images or image galleries kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

3 participants