New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 #3603
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 #3603
Conversation
Maybe we could migrate the formatting changes to another PR? |
The changes that get reformatted are introduced in this PR |
even the jupyter notebook ones? |
Oh didn't notice those. I'll remove them. |
@spolti Actually these notebook changes will have to be part of this PR to pass the CI build. |
pkg/controller/v1beta1/inferenceservice/reconcilers/deployment/deployment_reconciler.go
Show resolved
Hide resolved
/lgtm |
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
6171f02
to
ef42393
Compare
/lgtm |
Signed-off-by: Yuan Tang <[email protected]>
|
||
// The deployment strategy to use to replace existing pods with new ones. | ||
// +optional | ||
DeploymentStrategy *appsv1.DeploymentStrategy `json:"deploymentStrategy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when this field is set for the Knative mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in NewDeploymentReconciler.createRawDeployment
so it won't affect knative mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the field is still exposed on the isvc which user can set for the knative mode, but I guess knative probably validate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a validation that this field is not supported for knative otherwise user will expect this to work but knative performs blue/green deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
@terrytangyuan Thank you for making these changes. I see these deployment strategy targets inference services in raw deployment mode. Starting with kserve v12 we support raw deployment for inference graphs as well. Could you please extend this change to inference graph raw deployment as well? |
Signed-off-by: Yuan Tang <[email protected]>
I am trying to narrow down the scope of this PR. Extending this to IG would be a separate PR. |
Signed-off-by: Yuan Tang <[email protected]>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: spolti, terrytangyuan, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <[email protected]> * regen Signed-off-by: Yuan Tang <[email protected]> * lint Signed-off-by: Yuan Tang <[email protected]> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <[email protected]> * address comments Signed-off-by: Yuan Tang <[email protected]> * Add validation Signed-off-by: Yuan Tang <[email protected]> --------- Signed-off-by: Yuan Tang <[email protected]>
[RHOAIENG-3375][Cherry-pick] feat: Support customizable deployment strategy for RawDeployment mode. Fixes kserve#3452 (kserve#3603)
…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <[email protected]> * regen Signed-off-by: Yuan Tang <[email protected]> * lint Signed-off-by: Yuan Tang <[email protected]> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <[email protected]> * address comments Signed-off-by: Yuan Tang <[email protected]> * Add validation Signed-off-by: Yuan Tang <[email protected]> --------- Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: asd981256 <[email protected]>
* upgrade vllm/transformers version (#3671) upgrade vllm version Signed-off-by: Johnu George <[email protected]> * Add openai models endpoint (#3666) Signed-off-by: Curtis Maddalozzo <[email protected]> * feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <[email protected]> * regen Signed-off-by: Yuan Tang <[email protected]> * lint Signed-off-by: Yuan Tang <[email protected]> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <[email protected]> * address comments Signed-off-by: Yuan Tang <[email protected]> * Add validation Signed-off-by: Yuan Tang <[email protected]> --------- Signed-off-by: Yuan Tang <[email protected]> * Enable dtype support for huggingface server (#3613) * Enable dtype for huggingface server Signed-off-by: Dattu Sharma <[email protected]> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <[email protected]> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <[email protected]> * Fixup linter Signed-off-by: Dattu Sharma <[email protected]> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <[email protected]> * Fixup merge :) Signed-off-by: Dattu Sharma <[email protected]> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <[email protected]> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <[email protected]> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <[email protected]> * Fixup linter Signed-off-by: Dattu Sharma <[email protected]> * Update poetry Signed-off-by: Dattu Sharma <[email protected]> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <[email protected]> --------- Signed-off-by: Dattu Sharma <[email protected]> * Add method for checking model health/readiness (#3673) Signed-off-by: Curtis Maddalozzo <[email protected]> * fix for extract zip from gcs (#3510) * fix for extract zip from gcs Signed-off-by: Andrews Arokiam <[email protected]> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <[email protected]> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <[email protected]> * black format fix Signed-off-by: Andrews Arokiam <[email protected]> * code verification Signed-off-by: Andrews Arokiam <[email protected]> --------- Signed-off-by: Andrews Arokiam <[email protected]> * Update Dockerfile and Readme (#3676) Signed-off-by: Gavrish Prabhu <[email protected]> * Update huggingface readme (#3678) * update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <[email protected]> * Update README.md Signed-off-by: Alexa Griffith [email protected] * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <[email protected]> Signed-off-by: Alexa Griffith <[email protected]> * update vllm Signed-off-by: alexagriffith <[email protected]> * Update README.md --------- Signed-off-by: Alexa Griffith <[email protected]> Signed-off-by: Alexa Griffith [email protected] Signed-off-by: alexagriffith <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Filippe Spolti <[email protected]> Co-authored-by: Dan Sun <[email protected]> * fix: HPA equality check should include annotations (#3650) * fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <[email protected]> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <[email protected]> * simplify Signed-off-by: Yuan Tang <[email protected]> * Add missing delete action Signed-off-by: Yuan Tang <[email protected]> * fix logic Signed-off-by: Yuan Tang <[email protected]> --------- Signed-off-by: Yuan Tang <[email protected]> * Fix: huggingface runtime in helm chart (#3679) fix huggingface runtime in chart Signed-off-by: Dan Sun <[email protected]> * Fix: model id and model dir check order (#3680) * fix huggingface runtime in chart Signed-off-by: Dan Sun <[email protected]> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <[email protected]> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <[email protected]> * Lint format Signed-off-by: Dan Sun <[email protected]> --------- Signed-off-by: Dan Sun <[email protected]> * Fix:vLLM Model Supported check throwing circular dependency (#3688) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <[email protected]> * remove unwanted comments Signed-off-by: Gavrish Prabhu <[email protected]> * remove unwanted comments Signed-off-by: Gavrish Prabhu <[email protected]> * fix return case Signed-off-by: Gavrish Prabhu <[email protected]> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <[email protected]> * fixlint Signed-off-by: Gavrish Prabhu <[email protected]> --------- Signed-off-by: Gavrish Prabhu <[email protected]> * Fix: Allow null in Finish reason streaming response in vLLM (#3684) Fix: allow null in Finish reason Signed-off-by: Gavrish Prabhu <[email protected]> --------- Signed-off-by: Johnu George <[email protected]> Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: Dattu Sharma <[email protected]> Signed-off-by: Andrews Arokiam <[email protected]> Signed-off-by: Gavrish Prabhu <[email protected]> Signed-off-by: Alexa Griffith <[email protected]> Signed-off-by: Alexa Griffith [email protected] Signed-off-by: alexagriffith <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Curtis Maddalozzo <[email protected]> Co-authored-by: Yuan Tang <[email protected]> Co-authored-by: Datta Nimmaturi <[email protected]> Co-authored-by: Andrews Arokiam <[email protected]> Co-authored-by: Gavrish Prabhu <[email protected]> Co-authored-by: Alexa Griffith <[email protected]> Co-authored-by: Filippe Spolti <[email protected]> Co-authored-by: Dan Sun <[email protected]>
Fixes #3452. Supersedes #3479