Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow conversion of Llama / Mistral HF models #6144

Merged
merged 8 commits into from
Mar 29, 2024
8 changes: 8 additions & 0 deletions convert-hf-to-gguf.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use:

@Model.register("LlamaForCausalLM", "MistralForCausalLM", "MixtralForCausalLM")

on the existing MixtralModel (call it LlamaModel)? I don't see a point in supporting Mistral in this script without supporting Llama, and these classes are identical so they can be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. I tried to mimic what I saw in the existing entries.

In my opinion, it could be interesting to use different architectures for Mistral (and Mixtral) for informative purposes. If you click on the GGUF information for any Mistral file on the Hugging Face Hub, there's nothing that refers to mistral except the filename. But that would also require additional changes, so happy to use the same entry for the three of them!

Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,14 @@ def set_vocab(self):
self._set_vocab_sentencepiece()


@Model.register("MistralForCausalLM")
class MistralModel(Model):
model_arch = gguf.MODEL_ARCH.LLAMA

def set_vocab(self):
self._set_vocab_sentencepiece()


@Model.register("MiniCPMForCausalLM")
class MiniCPMModel(Model):
model_arch = gguf.MODEL_ARCH.MINICPM
Expand Down