Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builder OOM Killed #457

Open
herunyu opened this issue Mar 3, 2023 · 7 comments
Open

builder OOM Killed #457

herunyu opened this issue Mar 3, 2023 · 7 comments

Comments

@herunyu
Copy link

herunyu commented Mar 3, 2023

Hi there, I am trying to deploy a bento through yatai. However, it keeps giving me OOMKilled for the builder.

I have set the BentoML Configuration as the following:
image

But the resources of the building pod is not what I want:
image

It leads to the following error. Please take a look. Thank you!

image

@yetone
Copy link
Member

yetone commented Mar 6, 2023

You can use JSON editor to add resource limits to image builder Containers. If this is not possible, consider switching to a different image build engine, such as buildkit

image

image

How to switch image build engine:

helm -n yatai-image-builder get values yatai-image-builder > ./values.yaml
helm -n yatai-image-builder upgrade yatai-image-builder --values ./values.yaml --set bentoImageBuildEngine=buildkit

@herunyu
Copy link
Author

herunyu commented Mar 7, 2023

Somehow the JSON Editor is not showing anything. Is it because we are in a non external internet access environment?
Anyways, we have switched the image build engine to buildkit, still the same OOM issue.

Besides, if we delete the deployment through the Web UI, and try to deploy the same model, it will give us image build failed immediately. And I checked the log, it was the last deployment error. Not sure if this is a bug or what. We have to delete the BentoRequest for the bento in order to create a new deployment for the same model version.

@herunyu
Copy link
Author

herunyu commented Mar 13, 2023

After changing the default limitrange of memory, we can control the resource of the image building container. However, the next step was block by an image named "quay.io/bentoml/bentoml-proxy:0.0.1". As we do not have external internet access in the developing environment, we pulled the image outside the developing environment. But it seems the image repository of this bentoml-proxy image is fixed. Not sure how to change the repository to our internal repository. It may helpful if you guys can show us how. Thank you!
0080a227-0874-4166-8d04-bdd0d101fde1

@yetone
Copy link
Member

yetone commented Mar 13, 2023

@herunyu Thanks for your feedback, I just updated and released yatai-deployment and its helm chart, now you can specify a custom proxy image with this value, you can now update the helm repo and then update the yatai-deployment helm release to set this image

https://github.com/bentoml/yatai-deployment/blob/6cdba8c036e1ff4a33086efe913485593d7bf2a0/helm/yatai-deployment/values.yaml#L114

@yetone
Copy link
Member

yetone commented Mar 13, 2023

@herunyu I can demonstrate how to do this update.

First, update the helm repo:

helm repo update bentoml

Then save the previous values:

helm get values yatai-deployment -n yatai-deployment > /tmp/yatai-deployment-values.yaml

Final update on release:

helm -n yatai-deployment upgrade yatai-deployment bentoml/yatai-deployment --values /tmp/yatai-deployment-values.yaml --set internalImages.proxy=${your proxy image here}

@herunyu
Copy link
Author

herunyu commented Mar 13, 2023

@yetone Thank you! We will try this update and see if the problem is solved.

@wang-haoxian
Copy link

wang-haoxian commented Aug 1, 2023

You can use JSON editor to add resource limits to image builder Containers. If this is not possible, consider switching to a different image build engine, such as buildkit

image image

How to switch image build engine:

helm -n yatai-image-builder get values yatai-image-builder > ./values.yaml
helm -n yatai-image-builder upgrade yatai-image-builder --values ./values.yaml --set bentoImageBuildEngine=buildkit

Hello!
I am running to a similar issue and I didn't find the JSON editor in the webUI.
Could you please enlighten me? Thank you very much.

It's actually quite easy to get OOM for this setup especially with Transformers models.
I passed several hours on the documentation but I didn't find any thing about builder memory except for this page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants