New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a GGUF parser that works on remotely hosted files (over HTTP range requests) #540
Conversation
/** | ||
* Internal stateful instance to fetch ranges of HTTP data when needed | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGUF is more complex to parse remotely than safetensors because in GGUF, we don't know the header size before parsing the whole header.
Hence i'm doing requests for chunks of file ranges, on demand.
Great work on the library and thanks for the acknowledgement @julien-c! I'm excited to see GGUF get some more support. One thing that we've seen a lot is a lack of optional general metadata ( |
Unless I'm mistaken it only supports public non-gated repos |
Also tagging @Narsil for info |
Co-authored-by: Mishig <[email protected]>
@biw Yes, i liked the typing/validation in https://github.com/ahoylabs/gguf.js – neat idea (btw, opened this small PR on gguf.js). I'll keep this first PR minimal and we can revisit adding more types in the future. |
// @ts-ignore | ||
this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE); | ||
new Uint8Array(this.buffer).set(buf, this.chunk * HTTP_CHUNK_SIZE); | ||
this.chunk += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this explanation of binary arrays in JS is quite good: https://javascript.info/arraybuffer-binary-arrays
/// TODO(fix typing) | ||
// eslint-disable-next-line @typescript-eslint/ban-ts-comment | ||
// @ts-ignore | ||
this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see typing discussion here: microsoft/TypeScript#54636
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we would need a big-endian file to test too, to make sure our parsing is correct
from the spec:
Must be
3
for version described in this spec, which introduces big-endian support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing big-endian fails
const { metadata, tensorInfos } = await gguf("https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16-big-endian.gguf");
Error: not a valid gguf file: unsupported version "50331648"
However, on slack msg here, @ggerganov told us regarding the file that is being tested:
But I’m not 100% sure it is generated correctly. I used a script that we developed sometime last year:
python3 gguf-py/scripts/gguf-convert-endian.py ./some-little-endian-model.gguf big
However, I haven’t ran tests on actual big-endian systems, so there could be issues when reading those files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file seems to be converted correctly:
$ ▶ hexdump models/bert-bge-small/ggml-model-f16.gguf | head
0000000 4747 4655 0003 0000 00c5 0000 0000 0000 // this is the little-endian
0000010 0017 0000 0000 0000 0014 0000 0000 0000
0000020 6567 656e 6172 2e6c 7261 6863 7469 6365
0000030 7574 6572 0008 0000 0004 0000 0000 0000
0000040 6562 7472 000c 0000 0000 0000 6567 656e
0000050 6172 2e6c 616e 656d 0008 0000 0011 0000
0000060 0000 0000 6762 2d65 6d73 6c61 2d6c 6e65
0000070 762d 2e31 1035 0000 0000 0000 6200 7265
0000080 2e74 6c62 636f 5f6b 6f63 6e75 0474 0000
0000090 0c00 0000 1300 0000 0000 0000 6200 7265
$ ▶ hexdump models/bert-bge-small/ggml-model-f16-big-endian.gguf | head
0000000 4747 4655 0000 0300 0000 0000 0000 c500 // this is the converted big-endian
0000010 0000 0000 0000 1700 0000 0000 0000 1400
0000020 6567 656e 6172 2e6c 7261 6863 7469 6365
0000030 7574 6572 0000 0800 0000 0000 0000 0400
0000040 6562 7472 0000 0000 0000 0c00 6567 656e
0000050 6172 2e6c 616e 656d 0000 0800 0000 0000
0000060 0000 1100 6762 2d65 6d73 6c61 2d6c 6e65
0000070 762d 2e31 0035 0000 0000 0000 6210 7265
0000080 2e74 6c62 636f 5f6b 6f63 6e75 0074 0000
0000090 0004 0000 000c 0000 0000 0000 6213 7265
- The
"GGUF"
bytes are the same order for both LE and BE - The next 4 bytes are the version -
3
This is what should be used to determine the endianness of the file. For more info, see this discussion: GGUF endianness cannot be determined from GGUF itself ggerganov/llama.cpp#3957 - Next 8 bytes is the number of tensors:
c5 (hex) = 197 (dec)
- etc.
Given the error not a valid gguf file: unsupported version "50331648"
, it seems the reader that you are using does not check the version bytes to determine correctly the endianness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this was helpful: opened #545 to fix
also tagging @yagil 🔥 |
1. [Use length rather than newOffset](fcab2c9) (discussed [here](#540 (comment))) 2. [custom fetch fn](18f93f3) (discussed [here](#540 (comment)))
JFI ok on my side, I'll fix tooling issues if any pop up |
The important snippet is: ```ts const [littleEndian, version] = (() => { /// ggerganov/llama.cpp#3957 /// Assume this code is always running on little-endian /// but wants to be able to parse both endianness const version = r.view.getUint32(4, true); if (version & 65535) { return [true, version]; } else { return [false, r.view.getUint32(4, false)]; } })(); ``` from ggerganov/llama.cpp#3957 and thanks to @ggerganov [comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)
packages/gguf/src/gguf.spec.ts
Outdated
const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q2_K.gguf"; | ||
const URL_MISTRAL_7B = | ||
"https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf"; | ||
const URL_GEMMA_2B = "https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in subseq PR, we can probably move them under https://huggingface.co/huggingfacejs org so that the tests still work/pass if oweners of those repos make any change to those files or filenames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just replace main with the current revision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also https://huggingface.co/datasets/huggingface/moon-tests (which we could move to a resource group :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting the current versions of models 007c451
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ! 🚀
@coyotte508 please let me know if lgtm. I'm gonna merge soon |
GGUF add types. Follow up to #540 (comment). No any kind of validation, just types cc: @biw also
GGUF is the new single-file weights format that's been taking the Hub by storm...
Spec
https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
Reference implementation (Python): https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_reader.py
Acknowledgements & Inspirations