Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow conversion of Llama / Mistral HF models #6144

Merged
merged 8 commits into from Mar 29, 2024

Conversation

pcuenca
Copy link
Contributor

@pcuenca pcuenca commented Mar 18, 2024

This allows to convert fine-tuned models with convert-hf-to-gguf.py. The base architecture is set to llama, as in the models converted by @TheBloke. If necessary, we can add a new entry to constants.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use:

@Model.register("LlamaForCausalLM", "MistralForCausalLM", "MixtralForCausalLM")

on the existing MixtralModel (call it LlamaModel)? I don't see a point in supporting Mistral in this script without supporting Llama, and these classes are identical so they can be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. I tried to mimic what I saw in the existing entries.

In my opinion, it could be interesting to use different architectures for Mistral (and Mixtral) for informative purposes. If you click on the GGUF information for any Mistral file on the Hugging Face Hub, there's nothing that refers to mistral except the filename. But that would also require additional changes, so happy to use the same entry for the three of them!

@pcuenca pcuenca requested a review from cebtenzzre March 18, 2024 20:03
@cebtenzzre
Copy link
Collaborator

I tried to convert this model: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca

I got this exception:

$ TMPDIR=/var/tmp ./convert-hf-to-gguf.py ~/dirs/text-ai-models/dl/Mistral-7B-OpenOrca --outfile /dev/null
Loading model: Mistral-7B-OpenOrca
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
  File "/home/jared/src/forks/llama.cpp/./convert-hf-to-gguf.py", line 2073, in <module>
    main()
  File "/home/jared/src/forks/llama.cpp/./convert-hf-to-gguf.py", line 2060, in main
    model_instance.set_vocab()
  File "/home/jared/src/forks/llama.cpp/./convert-hf-to-gguf.py", line 1051, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/jared/src/forks/llama.cpp/./convert-hf-to-gguf.py", line 324, in _set_vocab_sentencepiece
    piece = tokenizer.id_to_piece(token_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jared/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 1179, in _batched_func
    return _func(self, arg)
           ^^^^^^^^^^^^^^^^
  File "/home/jared/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 1172, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

This model converts fine with convert.py.

model_arch = gguf.MODEL_ARCH.LLAMA

def set_vocab(self):
self._set_vocab_sentencepiece()
self._set_vocab_hf()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing conversion with fine-tuned models I experienced this problem: #6320. If that PR is merged, then we can also use _set_vocab_sentencepiece() here.

@pcuenca
Copy link
Contributor Author

pcuenca commented Mar 26, 2024

Sorry for the delay, @cebtenzzre, I could only return to this today. Testing the model you mentioned (https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) I experienced an inconsistency in the sentencepiece vocab method as reported in #6320. In addition, I had not taken care of tensor permutation in my initial PR.

I tested conversion of https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca and verified that generation matches conversion with convert.py using a temperature of 0.

@pcuenca
Copy link
Contributor Author

pcuenca commented Mar 27, 2024

It looks like the CI timed out in 6 hours.

@cebtenzzre
Copy link
Collaborator

This PR conflicts with #6355 which renames HfVocab to LlamaHfVocab and makes it specific to models with tokenizer.json - Mistral-7B-OpenOrca only has a tokenizer.model.

To be consistent with convert.py with the default --vocab-type, after that PR is merged you would want to do something like:

try:
    self. _set_vocab_sentencepiece()
except FileNotFoundError:
    self._set_vocab_llama_hf()

This benefits from a conditional dependency on transformers (sentencepiece is a required dependency atm) and accurate token scores when tokenizer.model is available. Does that seem reasonable?

@pcuenca
Copy link
Contributor Author

pcuenca commented Mar 28, 2024

@cebtenzzre yes, makes total sense! I merged and applied those changes, then tested with Mistral-7B-OpenOrca and Mistral-7B-v0.1.

@cebtenzzre
Copy link
Collaborator

With the changes I added, the set of metadata keys is exactly the same as those written by convert.py (checked with gguf-dump), with the only difference that general.name is "dl" with convert.py (it uses the parent directory name, which supposedly has meaning for the original llama checkpoints) and "Mistral-7B-OpenOrca" with convert-hf-to-gguf.py, which is much more useful.

@pcuenca
Copy link
Contributor Author

pcuenca commented Mar 28, 2024

Thank you! Much appreciated! 🙌

@ggerganov ggerganov merged commit b75c381 into ggerganov:master Mar 29, 2024
11 of 22 checks passed
@pcuenca pcuenca deleted the mistral-hf-conversion branch March 29, 2024 09:22
@pcuenca pcuenca changed the title Allow conversion of Mistral HF models Allow conversion of Llama / Mistral HF models Mar 29, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Allow conversion of Mistral HF models

* Homogenize Llama, Mistral, Mixtral under the same entry.

* Fix tokenizer, permute tensors

* Use sentencepiece tokenizer, or fall back to hfft.

* convert-hf : small fix for mypy

* convert-hf : fix duplicated block_count

* convert-hf : add vocab size to metadata

---------

Co-authored-by: Jared Van Bortel <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* Allow conversion of Mistral HF models

* Homogenize Llama, Mistral, Mixtral under the same entry.

* Fix tokenizer, permute tensors

* Use sentencepiece tokenizer, or fall back to hfft.

* convert-hf : small fix for mypy

* convert-hf : fix duplicated block_count

* convert-hf : add vocab size to metadata

---------

Co-authored-by: Jared Van Bortel <[email protected]>
tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024
* Allow conversion of Mistral HF models

* Homogenize Llama, Mistral, Mixtral under the same entry.

* Fix tokenizer, permute tensors

* Use sentencepiece tokenizer, or fall back to hfft.

* convert-hf : small fix for mypy

* convert-hf : fix duplicated block_count

* convert-hf : add vocab size to metadata

---------

Co-authored-by: Jared Van Bortel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants