Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 #3603

Merged
merged 10 commits into from May 9, 2024

Conversation

terrytangyuan
Copy link
Member

Fixes #3452. Supersedes #3479

@spolti
Copy link
Contributor

spolti commented Apr 15, 2024

Maybe we could migrate the formatting changes to another PR?

@terrytangyuan
Copy link
Member Author

Maybe we could migrate the formatting changes to another PR?

The changes that get reformatted are introduced in this PR

@spolti
Copy link
Contributor

spolti commented Apr 15, 2024

Maybe we could migrate the formatting changes to another PR?

The changes that get reformatted are introduced in this PR

even the jupyter notebook ones?

@terrytangyuan
Copy link
Member Author

Maybe we could migrate the formatting changes to another PR?

The changes that get reformatted are introduced in this PR

even the jupyter notebook ones?

Oh didn't notice those. I'll remove them.

@terrytangyuan
Copy link
Member Author

@spolti Actually these notebook changes will have to be part of this PR to pass the CI build.

@spolti
Copy link
Contributor

spolti commented Apr 16, 2024

/lgtm

@spolti
Copy link
Contributor

spolti commented Apr 22, 2024

/lgtm

@oss-prow-bot oss-prow-bot bot added the lgtm label Apr 22, 2024
Signed-off-by: Yuan Tang <[email protected]>
@oss-prow-bot oss-prow-bot bot removed the lgtm label Apr 22, 2024

// The deployment strategy to use to replace existing pods with new ones.
// +optional
DeploymentStrategy *appsv1.DeploymentStrategy `json:"deploymentStrategy,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this field is set for the Knative mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in NewDeploymentReconciler.createRawDeployment so it won't affect knative mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the field is still exposed on the isvc which user can set for the knative mode, but I guess knative probably validate this.

Copy link
Member

@yuzisun yuzisun May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a validation that this field is not supported for knative otherwise user will expect this to work but knative performs blue/green deployment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Signed-off-by: Yuan Tang <[email protected]>
@oss-prow-bot oss-prow-bot bot removed the lgtm label May 3, 2024
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
@bmopuri
Copy link
Contributor

bmopuri commented May 5, 2024

@terrytangyuan Thank you for making these changes. I see these deployment strategy targets inference services in raw deployment mode. Starting with kserve v12 we support raw deployment for inference graphs as well.

Could you please extend this change to inference graph raw deployment as well?

Signed-off-by: Yuan Tang <[email protected]>
@terrytangyuan
Copy link
Member Author

Could you please extend this change to inference graph raw deployment as well?

I am trying to narrow down the scope of this PR. Extending this to IG would be a separate PR.

Signed-off-by: Yuan Tang <[email protected]>
@yuzisun
Copy link
Member

yuzisun commented May 9, 2024

/lgtm
/approve

Copy link

oss-prow-bot bot commented May 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spolti, terrytangyuan, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@oss-prow-bot oss-prow-bot bot added the approved label May 9, 2024
@yuzisun yuzisun merged commit 629e4ae into kserve:master May 9, 2024
57 of 58 checks passed
@terrytangyuan terrytangyuan deleted the customize-deploy-strategy branch May 9, 2024 14:42
terrytangyuan added a commit to terrytangyuan/kserve that referenced this pull request May 9, 2024
…Fixes kserve#3452 (kserve#3603)

* feat: Support customizable deployment strategy for RawDeployment mode

Signed-off-by: Yuan Tang <[email protected]>

* regen

Signed-off-by: Yuan Tang <[email protected]>

* lint

Signed-off-by: Yuan Tang <[email protected]>

* Correctly apply rollingupdate

Signed-off-by: Yuan Tang <[email protected]>

* address comments

Signed-off-by: Yuan Tang <[email protected]>

* Add validation

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
openshift-merge-bot bot added a commit to opendatahub-io/kserve that referenced this pull request May 9, 2024
[RHOAIENG-3375][Cherry-pick] feat: Support customizable deployment strategy for RawDeployment mode. Fixes kserve#3452 (kserve#3603)
asd981256 pushed a commit to asd981256/kserve that referenced this pull request May 14, 2024
…Fixes kserve#3452 (kserve#3603)

* feat: Support customizable deployment strategy for RawDeployment mode

Signed-off-by: Yuan Tang <[email protected]>

* regen

Signed-off-by: Yuan Tang <[email protected]>

* lint

Signed-off-by: Yuan Tang <[email protected]>

* Correctly apply rollingupdate

Signed-off-by: Yuan Tang <[email protected]>

* address comments

Signed-off-by: Yuan Tang <[email protected]>

* Add validation

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: asd981256 <[email protected]>
oss-prow-bot bot pushed a commit that referenced this pull request May 18, 2024
* upgrade vllm/transformers version (#3671)

upgrade vllm version

Signed-off-by: Johnu George <[email protected]>

* Add openai models endpoint (#3666)

Signed-off-by: Curtis Maddalozzo <[email protected]>

* feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603)

* feat: Support customizable deployment strategy for RawDeployment mode

Signed-off-by: Yuan Tang <[email protected]>

* regen

Signed-off-by: Yuan Tang <[email protected]>

* lint

Signed-off-by: Yuan Tang <[email protected]>

* Correctly apply rollingupdate

Signed-off-by: Yuan Tang <[email protected]>

* address comments

Signed-off-by: Yuan Tang <[email protected]>

* Add validation

Signed-off-by: Yuan Tang <[email protected]>

---------

Signed-off-by: Yuan Tang <[email protected]>

* Enable dtype support for huggingface server (#3613)

* Enable dtype for huggingface server

Signed-off-by: Dattu Sharma <[email protected]>

* Set float16 as default. Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Add small comment to make the changes understandable

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Adapt to new huggingfacemodel

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup merge :)

Signed-off-by: Dattu Sharma <[email protected]>

* Explicitly mention the behaviour of dtype flag on auto.

Signed-off-by: Dattu Sharma <[email protected]>

* Default to FP32 for encoder models

Signed-off-by: Dattu Sharma <[email protected]>

* Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU

Signed-off-by: Dattu Sharma <[email protected]>

* Fixup linter

Signed-off-by: Dattu Sharma <[email protected]>

* Update poetry

Signed-off-by: Dattu Sharma <[email protected]>

* Use torch.float32 forr tests explicitly

Signed-off-by: Dattu Sharma <[email protected]>

---------

Signed-off-by: Dattu Sharma <[email protected]>

* Add method for checking model health/readiness (#3673)

Signed-off-by: Curtis Maddalozzo <[email protected]>

* fix for extract zip from gcs (#3510)

* fix for extract zip from gcs

Signed-off-by: Andrews Arokiam <[email protected]>

* initial commit for gcs model download unittests

Signed-off-by: Andrews Arokiam <[email protected]>

* unittests for model download from gcs

Signed-off-by: Andrews Arokiam <[email protected]>

* black format fix

Signed-off-by: Andrews Arokiam <[email protected]>

* code verification

Signed-off-by: Andrews Arokiam <[email protected]>

---------

Signed-off-by: Andrews Arokiam <[email protected]>

* Update Dockerfile and Readme (#3676)

Signed-off-by: Gavrish Prabhu <[email protected]>

* Update huggingface readme (#3678)

* update wording for huggingface README

small update to make readme easier to understand

Signed-off-by: Alexa Griffith  <[email protected]>

* Update README.md

Signed-off-by: Alexa Griffith [email protected]

* Update python/huggingfaceserver/README.md

Co-authored-by: Filippe Spolti <[email protected]>
Signed-off-by: Alexa Griffith  <[email protected]>

* update vllm

Signed-off-by: alexagriffith <[email protected]>

* Update README.md

---------

Signed-off-by: Alexa Griffith  <[email protected]>
Signed-off-by: Alexa Griffith [email protected]
Signed-off-by: alexagriffith <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Filippe Spolti <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* fix: HPA equality check should include annotations (#3650)

* fix: HPA equality check should include annotations

Signed-off-by: Yuan Tang <[email protected]>

* Only watch related autoscalerclass annotation

Signed-off-by: Yuan Tang <[email protected]>

* simplify

Signed-off-by: Yuan Tang <[email protected]>

* Add missing delete action

Signed-off-by: Yuan Tang <[email protected]>

* fix logic

Signed-off-by: Yuan Tang <[email protected]>
---------

Signed-off-by: Yuan Tang <[email protected]>

* Fix:  huggingface runtime in helm chart (#3679)

fix huggingface runtime in chart

Signed-off-by: Dan Sun <[email protected]>

* Fix: model id and model dir check order (#3680)

* fix huggingface runtime in chart

Signed-off-by: Dan Sun <[email protected]>

* Allow model_dir to be specified on template

Signed-off-by: Dan Sun <[email protected]>

* Default model_dir to /mnt/models for HF

Signed-off-by: Dan Sun <[email protected]>

* Lint format

Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Dan Sun <[email protected]>

* Fix:vLLM Model Supported check throwing circular dependency (#3688)

* Fix:vLLM Model Supported check throwing circular dependency

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <[email protected]>

* remove unwanted comments

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix return case

Signed-off-by: Gavrish Prabhu <[email protected]>

* fix to check all arch in model config forr vllm support

Signed-off-by: Gavrish Prabhu <[email protected]>

* fixlint

Signed-off-by: Gavrish Prabhu <[email protected]>

---------

Signed-off-by: Gavrish Prabhu <[email protected]>

* Fix: Allow null in Finish reason streaming response in vLLM (#3684)

Fix: allow null in Finish reason

Signed-off-by: Gavrish Prabhu <[email protected]>

---------

Signed-off-by: Johnu George <[email protected]>
Signed-off-by: Curtis Maddalozzo <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Dattu Sharma <[email protected]>
Signed-off-by: Andrews Arokiam <[email protected]>
Signed-off-by: Gavrish Prabhu <[email protected]>
Signed-off-by: Alexa Griffith  <[email protected]>
Signed-off-by: Alexa Griffith [email protected]
Signed-off-by: alexagriffith <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Curtis Maddalozzo <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Datta Nimmaturi <[email protected]>
Co-authored-by: Andrews Arokiam <[email protected]>
Co-authored-by: Gavrish Prabhu <[email protected]>
Co-authored-by: Alexa Griffith <[email protected]>
Co-authored-by: Filippe Spolti <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to change parameters of of the default Deployment created from ServingRuntime and InferenceService
5 participants