Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a GGUF parser that works on remotely hosted files (over HTTP range requests) #540

Merged
merged 14 commits into from Mar 13, 2024

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Mar 8, 2024

GGUF is the new single-file weights format that's been taking the Hub by storm...

image

Spec

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

Reference implementation (Python): https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_reader.py

Acknowledgements & Inspirations

@julien-c julien-c requested a review from mishig25 March 8, 2024 21:47
packages/gguf/src/gguf.spec.ts Show resolved Hide resolved
packages/gguf/src/gguf.spec.ts Show resolved Hide resolved
.github/workflows/agents-publish.yml Show resolved Hide resolved
@julien-c julien-c requested a review from coyotte508 March 8, 2024 22:15
Comment on lines +56 to +58
/**
* Internal stateful instance to fetch ranges of HTTP data when needed
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGUF is more complex to parse remotely than safetensors because in GGUF, we don't know the header size before parsing the whole header.

Hence i'm doing requests for chunks of file ranges, on demand.

@biw
Copy link

biw commented Mar 8, 2024

Great work on the library and thanks for the acknowledgement @julien-c! I'm excited to see GGUF get some more support.

One thing that we've seen a lot is a lack of optional general metadata (general.author, general.url, general.description, general.license, general.source.url, & general.source.huggingface.repository) included in most GGUF files. Do you have any thoughts on adding that as a first-class citizen (typescript types) to the PR? I'm sure having that in huggingface.js would help bring attention to model authors that the fields are there (it's very useful to have for model frontends wanting to support arbitrary models)

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
@coyotte508
Copy link
Member

Unless I'm mistaken it only supports public non-gated repos

packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
@julien-c julien-c requested a review from Narsil March 11, 2024 09:33
@julien-c
Copy link
Member Author

Also tagging @Narsil for info

@julien-c
Copy link
Member Author

@biw Yes, i liked the typing/validation in https://github.com/ahoylabs/gguf.js – neat idea (btw, opened this small PR on gguf.js). I'll keep this first PR minimal and we can revisit adding more types in the future.

// @ts-ignore
this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE);
new Uint8Array(this.buffer).set(buf, this.chunk * HTTP_CHUNK_SIZE);
this.chunk += 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this explanation of binary arrays in JS is quite good: https://javascript.info/arraybuffer-binary-arrays

/// TODO(fix typing)
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
this.buffer.resize((this.chunk + 1) * HTTP_CHUNK_SIZE);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see typing discussion here: microsoft/TypeScript#54636

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we would need a big-endian file to test too, to make sure our parsing is correct

from the spec:

Must be 3 for version described in this spec, which introduces big-endian support.

Copy link
Collaborator

@mishig25 mishig25 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing big-endian fails

const { metadata, tensorInfos } = await gguf("https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16-big-endian.gguf");

Error: not a valid gguf file: unsupported version "50331648"

However, on slack msg here, @ggerganov told us regarding the file that is being tested:

But I’m not 100% sure it is generated correctly. I used a script that we developed sometime last year:
python3 gguf-py/scripts/gguf-convert-endian.py ./some-little-endian-model.gguf big
However, I haven’t ran tests on actual big-endian systems, so there could be issues when reading those files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file seems to be converted correctly:

$ ▶ hexdump models/bert-bge-small/ggml-model-f16.gguf | head
0000000 4747 4655 0003 0000 00c5 0000 0000 0000     // this is the little-endian
0000010 0017 0000 0000 0000 0014 0000 0000 0000
0000020 6567 656e 6172 2e6c 7261 6863 7469 6365
0000030 7574 6572 0008 0000 0004 0000 0000 0000
0000040 6562 7472 000c 0000 0000 0000 6567 656e
0000050 6172 2e6c 616e 656d 0008 0000 0011 0000
0000060 0000 0000 6762 2d65 6d73 6c61 2d6c 6e65
0000070 762d 2e31 1035 0000 0000 0000 6200 7265
0000080 2e74 6c62 636f 5f6b 6f63 6e75 0474 0000
0000090 0c00 0000 1300 0000 0000 0000 6200 7265

$ ▶ hexdump models/bert-bge-small/ggml-model-f16-big-endian.gguf | head
0000000 4747 4655 0000 0300 0000 0000 0000 c500     // this is the converted big-endian
0000010 0000 0000 0000 1700 0000 0000 0000 1400
0000020 6567 656e 6172 2e6c 7261 6863 7469 6365
0000030 7574 6572 0000 0800 0000 0000 0000 0400
0000040 6562 7472 0000 0000 0000 0c00 6567 656e
0000050 6172 2e6c 616e 656d 0000 0800 0000 0000
0000060 0000 1100 6762 2d65 6d73 6c61 2d6c 6e65
0000070 762d 2e31 0035 0000 0000 0000 6210 7265
0000080 2e74 6c62 636f 5f6b 6f63 6e75 0074 0000
0000090 0004 0000 000c 0000 0000 0000 6213 7265

Given the error not a valid gguf file: unsupported version "50331648", it seems the reader that you are using does not check the version bytes to determine correctly the endianness

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this was helpful: opened #545 to fix

@julien-c
Copy link
Member Author

also tagging @yagil 🔥

@mishig25 mishig25 mentioned this pull request Mar 11, 2024
1. [Use length rather than
newOffset](fcab2c9)
(discussed
[here](#540 (comment)))
2. [custom fetch
fn](18f93f3)
(discussed
[here](#540 (comment)))
@coyotte508
Copy link
Member

JFI ok on my side, I'll fix tooling issues if any pop up

The important snippet is:

```ts
const [littleEndian, version] = (() => {
	/// ggerganov/llama.cpp#3957
	/// Assume this code is always running on little-endian
	/// but wants to be able to parse both endianness
	const version = r.view.getUint32(4, true);
	if (version & 65535) {
		return [true, version];
	} else {
		return [false, r.view.getUint32(4, false)];
	}
})();
```

from ggerganov/llama.cpp#3957 and thanks to
@ggerganov
[comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)
packages/gguf/src/gguf.ts Outdated Show resolved Hide resolved
Comment on lines 4 to 7
const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q2_K.gguf";
const URL_MISTRAL_7B =
"https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf";
const URL_GEMMA_2B = "https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in subseq PR, we can probably move them under https://huggingface.co/huggingfacejs org so that the tests still work/pass if oweners of those repos make any change to those files or filenames

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just replace main with the current revision?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also https://huggingface.co/datasets/huggingface/moon-tests (which we could move to a resource group :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting the current versions of models 007c451

Copy link
Collaborator

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ! 🚀

@mishig25
Copy link
Collaborator

@coyotte508 please let me know if lgtm. I'm gonna merge soon

@mishig25 mishig25 merged commit 7359058 into main Mar 13, 2024
2 checks passed
@mishig25 mishig25 deleted the gguf branch March 13, 2024 09:37
@julien-c
Copy link
Member Author

@biw i might add a little bit of typing and/or validation in the coming week – will ping here with a new PR cc @mishig25

@mishig25 mishig25 mentioned this pull request Mar 19, 2024
mishig25 added a commit that referenced this pull request Mar 20, 2024
GGUF add types. Follow up to
#540 (comment).

No any kind of validation, just types

cc: @biw also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants