Support GGUF models #98

vladfaust · 2024-08-12T07:38:09Z

See vllm-project/vllm#1002, vllm-project/vllm#5191.

Should be able to set gguf as QUANTIZATION envar, but we also need to provide exact quant. I'm thinking of some MODEL_FILENAME envar containing the exact filename in the model's repository. The model download logic shall be changed, see https://github.com/Isotr0py/vllm/blob/main/examples/gguf_inference.py.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GGUF models #98

Support GGUF models #98

vladfaust commented Aug 12, 2024

Support GGUF models #98

Support GGUF models #98

Comments

vladfaust commented Aug 12, 2024