a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

julien-c · 2024-03-08T21:47:29Z

GGUF is the new single-file weights format that's been taking the Hub by storm...

Spec

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

Reference implementation (Python): https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_reader.py

Acknowledgements & Inspirations

https://github.com/hyparam/hyllama by @platypii (MIT license)
https://github.com/ahoylabs/gguf.js by @biw @dkogut1996 @spencekim (MIT license)
might also be of interest to: @yagil

packages/gguf/src/gguf.spec.ts

.github/workflows/agents-publish.yml

julien-c · 2024-03-08T22:24:07Z

packages/gguf/src/gguf.ts

+/**
+ * Internal stateful instance to fetch ranges of HTTP data when needed
+ */


GGUF is more complex to parse remotely than safetensors because in GGUF, we don't know the header size before parsing the whole header.

Hence i'm doing requests for chunks of file ranges, on demand.

biw · 2024-03-08T22:24:34Z

Great work on the library and thanks for the acknowledgement @julien-c! I'm excited to see GGUF get some more support.

One thing that we've seen a lot is a lack of optional general metadata (general.author, general.url, general.description, general.license, general.source.url, & general.source.huggingface.repository) included in most GGUF files. Do you have any thoughts on adding that as a first-class citizen (typescript types) to the PR? I'm sure having that in huggingface.js would help bring attention to model authors that the fields are there (it's very useful to have for model frontends wanting to support arbitrary models)

packages/gguf/src/gguf.ts

coyotte508 · 2024-03-09T14:22:54Z

Unless I'm mistaken it only supports public non-gated repos

packages/gguf/src/gguf.ts

julien-c · 2024-03-11T09:34:07Z

Also tagging @Narsil for info

Co-authored-by: Mishig <[email protected]>

julien-c · 2024-03-11T11:57:43Z

@biw Yes, i liked the typing/validation in https://github.com/ahoylabs/gguf.js – neat idea (btw, opened this small PR on gguf.js). I'll keep this first PR minimal and we can revisit adding more types in the future.

julien-c · 2024-03-11T12:00:28Z

packages/gguf/src/gguf.ts

+ // @ts-ignore
+ this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE);
+ new Uint8Array(this.buffer).set(buf, this.chunk * HTTP_CHUNK_SIZE);
+ this.chunk += 1;


BTW this explanation of binary arrays in JS is quite good: https://javascript.info/arraybuffer-binary-arrays

julien-c · 2024-03-11T12:03:04Z

packages/gguf/src/gguf.ts

+ /// TODO(fix typing)
+ // eslint-disable-next-line @typescript-eslint/ban-ts-comment
+ // @ts-ignore
+ this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE);


see typing discussion here: microsoft/TypeScript#54636

julien-c · 2024-03-11T12:05:06Z

packages/gguf/src/gguf.spec.ts

ideally we would need a big-endian file to test too, to make sure our parsing is correct

from the spec:

Must be 3 for version described in this spec, which introduces big-endian support.

testing big-endian fails

const { metadata, tensorInfos } = await gguf("https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16-big-endian.gguf");

Error: not a valid gguf file: unsupported version "50331648"

However, on slack msg here, @ggerganov told us regarding the file that is being tested:

But I’m not 100% sure it is generated correctly. I used a script that we developed sometime last year:
python3 gguf-py/scripts/gguf-convert-endian.py ./some-little-endian-model.gguf big
However, I haven’t ran tests on actual big-endian systems, so there could be issues when reading those files.

The file seems to be converted correctly:

$ ▶ hexdump models/bert-bge-small/ggml-model-f16.gguf | head 0000000 4747 4655 0003 0000 00c5 0000 0000 0000 // this is the little-endian 0000010 0017 0000 0000 0000 0014 0000 0000 0000 0000020 6567 656e 6172 2e6c 7261 6863 7469 6365 0000030 7574 6572 0008 0000 0004 0000 0000 0000 0000040 6562 7472 000c 0000 0000 0000 6567 656e 0000050 6172 2e6c 616e 656d 0008 0000 0011 0000 0000060 0000 0000 6762 2d65 6d73 6c61 2d6c 6e65 0000070 762d 2e31 1035 0000 0000 0000 6200 7265 0000080 2e74 6c62 636f 5f6b 6f63 6e75 0474 0000 0000090 0c00 0000 1300 0000 0000 0000 6200 7265 $ ▶ hexdump models/bert-bge-small/ggml-model-f16-big-endian.gguf | head 0000000 4747 4655 0000 0300 0000 0000 0000 c500 // this is the converted big-endian 0000010 0000 0000 0000 1700 0000 0000 0000 1400 0000020 6567 656e 6172 2e6c 7261 6863 7469 6365 0000030 7574 6572 0000 0800 0000 0000 0000 0400 0000040 6562 7472 0000 0000 0000 0c00 6567 656e 0000050 6172 2e6c 616e 656d 0000 0800 0000 0000 0000060 0000 1100 6762 2d65 6d73 6c61 2d6c 6e65 0000070 762d 2e31 0035 0000 0000 0000 6210 7265 0000080 2e74 6c62 636f 5f6b 6f63 6e75 0074 0000 0000090 0004 0000 000c 0000 0000 0000 6213 7265

The "GGUF" bytes are the same order for both LE and BE

The next 4 bytes are the version - 3
This is what should be used to determine the endianness of the file. For more info, see this discussion: GGUF endianness cannot be determined from GGUF itself ggerganov/llama.cpp#3957

Next 8 bytes is the number of tensors: c5 (hex) = 197 (dec)

etc.

Given the error not a valid gguf file: unsupported version "50331648", it seems the reader that you are using does not check the version bytes to determine correctly the endianness

thanks, this was helpful: opened #545 to fix

julien-c · 2024-03-11T13:01:56Z

also tagging @yagil 🔥

1. [Use length rather than newOffset](fcab2c9) (discussed [here](#540 (comment))) 2. [custom fetch fn](18f93f3) (discussed [here](#540 (comment)))

coyotte508 · 2024-03-12T21:27:49Z

JFI ok on my side, I'll fix tooling issues if any pop up

@ggerganov

The important snippet is: ```ts const [littleEndian, version] = (() => { /// ggerganov/llama.cpp#3957 /// Assume this code is always running on little-endian /// but wants to be able to parse both endianness const version = r.view.getUint32(4, true); if (version & 65535) { return [true, version]; } else { return [false, r.view.getUint32(4, false)]; } })(); ``` from ggerganov/llama.cpp#3957 and thanks to @ggerganov [comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)

packages/gguf/src/gguf.ts

mishig25 · 2024-03-13T09:16:37Z

packages/gguf/src/gguf.spec.ts

+const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q2_K.gguf";
+const URL_MISTRAL_7B =
+ "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf";
+const URL_GEMMA_2B = "https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf";


nit: in subseq PR, we can probably move them under https://huggingface.co/huggingfacejs org so that the tests still work/pass if oweners of those repos make any change to those files or filenames

or just replace main with the current revision?

There's also https://huggingface.co/datasets/huggingface/moon-tests (which we could move to a resource group :) )

getting the current versions of models 007c451

mishig25

lgtm ! 🚀

mishig25 · 2024-03-13T09:31:28Z

@coyotte508 please let me know if lgtm. I'm gonna merge soon

julien-c · 2024-03-15T18:43:44Z

@biw i might add a little bit of typing and/or validation in the coming week – will ping here with a new PR cc @mishig25

@biw

GGUF add types. Follow up to #540 (comment). No any kind of validation, just types cc: @biw also

julien-c added 5 commits March 8, 2024 15:02

scaffolding from CONTRIBUTING.md

fba88a4

initial import

e624cb0

fetch ranges of HTTP data when needed

1877bab

Yay! it's working now

abfe909

Acknowledgements

97f5d25

julien-c requested a review from mishig25 March 8, 2024 21:47

julien-c added 2 commits March 8, 2024 23:03

browser compat?

8e28922

ArrayBuffer.resize requires Node 20

312aef0

julien-c commented Mar 8, 2024

View reviewed changes

packages/gguf/src/gguf.spec.ts Show resolved Hide resolved

packages/gguf/src/gguf.spec.ts Show resolved Hide resolved

.github/workflows/agents-publish.yml Show resolved Hide resolved

julien-c requested a review from coyotte508 March 8, 2024 22:15

julien-c commented Mar 8, 2024

View reviewed changes

julien-c mentioned this pull request Mar 8, 2024

Parse tensor infos + E2E test hyparam/hyllama#2

Merged

mishig25 reviewed Mar 8, 2024

View reviewed changes

packages/gguf/src/gguf.ts Show resolved Hide resolved

mishig25 reviewed Mar 8, 2024

View reviewed changes

packages/gguf/src/gguf.ts Show resolved Hide resolved

mishig25 reviewed Mar 8, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 8, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 9, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 9, 2024

View reviewed changes

packages/gguf/src/gguf.ts Show resolved Hide resolved

mishig25 reviewed Mar 9, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 9, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 10, 2024

View reviewed changes

packages/gguf/src/gguf.ts Show resolved Hide resolved

mishig25 reviewed Mar 10, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

julien-c requested a review from Narsil March 11, 2024 09:33

julien-c and others added 3 commits March 11, 2024 12:47

Update packages/gguf/src/gguf.ts

e3030fd

Co-authored-by: Mishig <[email protected]>

set this as a const

4eac1cd

review from @mishig25

9366d4a

julien-c commented Mar 11, 2024

View reviewed changes

mishig25 mentioned this pull request Mar 11, 2024

Gguf updates #543

Merged

Gguf updates (#543)

8ec3643

1. [Use length rather than newOffset](fcab2c9) (discussed [here](#540 (comment))) 2. [custom fetch fn](18f93f3) (discussed [here](#540 (comment)))

julien-c mentioned this pull request Mar 12, 2024

GGUF endianness cannot be determined from GGUF itself ggerganov/llama.cpp#3957

Closed

mishig25 reviewed Mar 13, 2024

View reviewed changes

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved

mishig25 reviewed Mar 13, 2024

View reviewed changes

mishig25 approved these changes Mar 13, 2024

View reviewed changes

mishig25 and others added 2 commits March 13, 2024 09:28

use current versions of model weights

007c451

Update packages/gguf/src/gguf.ts

11b6b89

coyotte508 approved these changes Mar 13, 2024

View reviewed changes

mishig25 merged commit 7359058 into main Mar 13, 2024
2 checks passed

mishig25 deleted the gguf branch March 13, 2024 09:37

mishig25 mentioned this pull request Mar 19, 2024

[gguf] Add types #562

Merged

mishig25 added a commit that referenced this pull request Mar 20, 2024

[gguf] Add types (#562)

e745ba5

GGUF add types. Follow up to #540 (comment). No any kind of validation, just types cc: @biw also

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

julien-c commented Mar 8, 2024

julien-c Mar 8, 2024

biw commented Mar 8, 2024

coyotte508 commented Mar 9, 2024

julien-c commented Mar 11, 2024

julien-c commented Mar 11, 2024

julien-c Mar 11, 2024

julien-c Mar 11, 2024

julien-c Mar 11, 2024

mishig25 Mar 11, 2024 •

edited

ggerganov Mar 12, 2024

julien-c Mar 12, 2024

julien-c commented Mar 11, 2024

coyotte508 commented Mar 12, 2024

mishig25 Mar 13, 2024

julien-c Mar 13, 2024

coyotte508 Mar 13, 2024

mishig25 Mar 13, 2024

mishig25 left a comment

mishig25 commented Mar 13, 2024

julien-c commented Mar 15, 2024

a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

Conversation

julien-c commented Mar 8, 2024

Spec

Acknowledgements & Inspirations

Choose a reason for hiding this comment

biw commented Mar 8, 2024

coyotte508 commented Mar 9, 2024

julien-c commented Mar 11, 2024

julien-c commented Mar 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mishig25 Mar 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julien-c commented Mar 11, 2024

coyotte508 commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mishig25 left a comment

Choose a reason for hiding this comment

mishig25 commented Mar 13, 2024

julien-c commented Mar 15, 2024

mishig25 Mar 11, 2024 •

edited