[Serving] Support Gemma for serving #1806

MasterJH5574 · 2024-02-21T22:59:17Z

Following #1805, this PR supports Gemma model in MLC Serve.

Still working in progress for tests and examples.

This PR brings the Gemma model support. Right now it supports `q0f16`, `q0f32`, `q4f16_1` quantization modes for both 7B and 2B variants in MLC Chat. We are testing unquantized Gemma for MLC Serve. Changes will be submitted if there is any. --- Co-authored-by: Rick Zhou <[email protected]> Co-authored-by: Charlie Ruan <[email protected]>

This PR supports Gemma model in MLC Serve.

MasterJH5574 and others added 2 commits February 21, 2024 17:57

[Serving] Support Gemma for serving

cade9fc

This PR supports Gemma model in MLC Serve.

MasterJH5574 marked this pull request as draft February 21, 2024 22:59

Neet-Nestor force-pushed the main branch from 9905667 to 14bec5a Compare May 27, 2024 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serving] Support Gemma for serving #1806

[Serving] Support Gemma for serving #1806

MasterJH5574 commented Feb 21, 2024

[Serving] Support Gemma for serving #1806

Are you sure you want to change the base?

[Serving] Support Gemma for serving #1806

Conversation

MasterJH5574 commented Feb 21, 2024