Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF endianness cannot be determined from GGUF itself #3957

Open
philpax opened this issue Nov 5, 2023 · 17 comments
Open

GGUF endianness cannot be determined from GGUF itself #3957

philpax opened this issue Nov 5, 2023 · 17 comments
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. enhancement New feature or request good first issue Good for newcomers

Comments

@philpax
Copy link

philpax commented Nov 5, 2023

As of the time of writing, the big-endian support that was added in #3552 doesn't encode the endianness within the file itself:

endianess_str = "Big Endian" if self.endianess == GGUFEndian.BIG else "Little Endian"
print(f"This gguf file is for {endianess_str} only")
def write_header_to_file(self):
self.fout.write(struct.pack("<I", GGUF_MAGIC))
self.fout.write(struct.pack(f"{self.pack_prefix}I", GGUF_VERSION))
self.fout.write(struct.pack(f"{self.pack_prefix}Q", self.ti_data_count))
self.fout.write(struct.pack(f"{self.pack_prefix}Q", self.kv_data_count))
self.flush()
# print("tensors " + str(self.ti_data_count) + " kv " + str(self.kv_data_count))

This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅

The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.

Given that, my suggestion would be to use FUGG as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).

@abrander
Copy link

abrander commented Nov 6, 2023

This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅

Well. The future is now 😄

The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.

I suggest storing the endianness as a single byte that can be read without ambiguity on both little- and big-endian hosts.

Reading a 16-, 32- or 64-bit number is often confusing when you don't know the endianness of the file itself. It quickly leads to very confusing code where you must first guess the host byte order, whether you're reading the file using native byte order, little-endian, or big-endian byte order, and somehow combine those using pure magic and #ifdef guards.

We could use a single int8 (The ASCII value L or B for easy detection by humans in a hex viewer) to distinguish the file byte order.

It will be a breaking change, but failure because of hidden endian issues is also a breaking change.

Given that, my suggestion would be to use FUGG as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).

The only argument I see for making the file magic endian dependable would be to make incompatible readers bail fast - but hey, if they ignore the version field, all bets are off anyway.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Nov 6, 2023

Hmm, does anything actually need to change other than just storing the version field in the respective endianness? Until there's GGUF version 20,000 or something, every big-endian version is going to be invalid for little endian and every little endian version is going to be invalid for big endian. (Except 0 which is the same for both, but we're already past that.)

edit: If you don't like the idea of just relying on the version being invalid if the file was created for the wrong endian, an alternative (also backward compatible except for existing BE files) is to say the version field is still 4 bytes but the value is a uint16 now. LE can use the first two bytes - uint32 value 4 is \x04\x00\x00\x00 and uint16 value 4 is \x04\x00 so no change is needed. BE can use the second two bytes. Assuming existing BE files wrote the version in BE format, then that also will already be correct since BE 4 is \x00\x00\x00\x04 and BE uint16 [0,4] would be the same.

@philpax
Copy link
Author

philpax commented Nov 8, 2023

Hm - yes, you're both right, I hadn't noticed the version was written with the correct endianness. This is then not as much of an issue as it seemed - big-endian files that end up out there are unlikely to be loadable by existing executors if they're correctly version-checking.

That being said, I'd still prefer to make it explicit instead of having it be an implicit property of the version number. I can't imagine it'll ever reach a version number in which there's a realistic possibility of confusion, but it's easier for readers if they can determine it directly without having to check if they support the endian-swapped version number.

edit: If you don't like the idea of just relying on the version being invalid if the file was created for the wrong endian, an alternative (also backward compatible except for existing BE files) is to say the version field is still 4 bytes but the value is a uint16 now. LE can use the first two bytes - uint32 value 4 is \x04\x00\x00\x00 and uint16 value 4 is \x04\x00 so no change is needed. BE can use the second two bytes. Assuming existing BE files wrote the version in BE format, then that also will already be correct since BE 4 is \x00\x00\x00\x04 and BE uint16 [0,4] would be the same.

Interesting and clever idea. This could work and wouldn't require a strictly-breaking change to the format.

The immediate concerns I have are that:

  • no matter what you do, you'll be wasting two bytes. This really doesn't matter given the models are gigabytes in size, but figured I'd mention it still.

  • the writers and readers need to be aware that they need to check the last two bytes instead of the first two if running on BE.

    This isn't a deal-breaker, but it seems pretty easy to forget to do this. To be fair, most people aren't going to be developing against big-endian anyway - perhaps it's fine if it just breaks there.

Ultimately I don't think either of these really matter, it's just not quite as elegant as a more dedicated detection mechanism. I'd be fine with this if we wanted to maintain strict backwards compatibility.


With that being said, a cursory search indicates that nobody's uploaded big-endian models to HF yet, so I'm pretty comfortable with breaking them specifically as long as LE's fine. I can think of two potential solutions here:

  • Change the magic number based on endianness (original proposal). This makes it very obvious, but may confuse people looking at the format in the wild if they've never seen the big-endian variant before.

  • Encode the endianness in the version number in a way that's freely compatible with LE and easy to implement for BE. The easy solution is to make it a right-aligned uint16 and use the first two bytes as a boolean (first byte is a big-endian-active bool, second byte is unused).

    If we want to be less wasteful and allow for going past 65,536, we can encode it in the MSB of the first byte, but this requires an annoying amount of masking for what seems like a very unlikely outcome.

I'm leaning towards the right-aligned uint16 solution because it maintains LE compatibility, is still obviously a GGUF file, and should work for us until sometime in the next century.

I could be convinced to use one of the three other solutions (magic number solution, using only one bit, or BE-backwards-compatible moving uint16), but this seems like the best set of tradeoffs.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Nov 9, 2023

If anyone wants to test with non-native endian files, I added a conversion script in #3981 - you can just do something like:

python gguf-py/scripts/gguf-convert-endian.py blah.gguf big

Warning: This does not produce a new file, it modifies the input.

It currently can only convert F32, F16 and Q8_0 tensors (actually supporting the rest of the quantizations wouldn't be very hard).

@KerfuffleV2
Copy link
Collaborator

@philpax

If we want to be less wasteful and allow for going past 65,536

I don't mind leaving the problem of dealing with GGUF version 65536 to the people in the year 4,000.

My solution was actually a little overcomplicated though. You can just read the field as normal then bitwise AND with 65535 and you'll get 0 if it's the wrong endian.

abrander added a commit to abrander/gguf that referenced this issue Nov 10, 2023
@abrander
Copy link

For anyone interested, I have implemented v3 byte order detection - without needing to know the host byte order - using the last byte of the version field as a big-endian marker.

It works as intended. It's a bit ugly, but it works.

@KerfuffleV2
Copy link
Collaborator

It works as intended. It's a bit ugly, but it works.

Until version 256. :) That still may never happen in our lifetimes, but still. I think & 65535 is a bit more reliable. Then if it's 0, you can byte swap and test if it's a valid version. (The approach I took.)

@philpax
Copy link
Author

philpax commented Nov 12, 2023

Yeah I think I'd be fine with & 65536. That seems pretty straightforward and I will be impressed if we manage to get anywhere near that in our lifetimes.

@KerfuffleV2
Copy link
Collaborator

Yeah I think I'd be fine with & 65536.

I don't think you will be. :) It's gotta be 65535 here unless you want 0 and 65536 as the only two possible values.

@philpax
Copy link
Author

philpax commented Nov 12, 2023

Yes, of course - my bad 😅

@julien-c
Copy link

Thanks @ggerganov for pointing me to this nice convo, it should enable us to support visualization of big-endian gguf files in HF:
see huggingface/huggingface.js#540 (and the small fix huggingface/huggingface.js#545) for context! 🔥

julien-c added a commit to huggingface/huggingface.js that referenced this issue Mar 13, 2024
The important snippet is:

```ts
const [littleEndian, version] = (() => {
	/// ggerganov/llama.cpp#3957
	/// Assume this code is always running on little-endian
	/// but wants to be able to parse both endianness
	const version = r.view.getUint32(4, true);
	if (version & 65535) {
		return [true, version];
	} else {
		return [false, r.view.getUint32(4, false)];
	}
})();
```

from ggerganov/llama.cpp#3957 and thanks to
@ggerganov
[comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)
@github-actions github-actions bot added the stale label Apr 12, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@philpax
Copy link
Author

philpax commented May 8, 2024

Just noticed this was marked as stale - as far as I know, this is still an issue, the only indicator that a GGUF is big-endian is if its version is in big-endian, which is not a strong signal. Given that big-endian models are the minority by far, I'd continue to suggest reversing the magic number to make it explicit, and forcing a re-encode of the BE models.

@ggerganov
Copy link
Owner

and forcing a re-encode of the BE models.

Re-encode "GGUF" -> "FUGG", is this the proposal?

@philpax
Copy link
Author

philpax commented May 8, 2024

Yes, pretty much. I think that should be sufficient?

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. and removed bug-unconfirmed stale labels May 9, 2024
@ggerganov ggerganov reopened this May 9, 2024
@julien-c
Copy link

julien-c commented May 9, 2024

the only indicator that a GGUF is big-endian is if its version is in big-endian, which is not a strong signal

Seems to work well enough in existing implementations, though (just my 2 cents)

@philpax
Copy link
Author

philpax commented May 11, 2024

It works as a weak signal ("this file can't be loaded"), but you have to use a heuristic to determine if it's a big-endian file ("could I support this if the version was endian-swapped?").

A reader could determine if a file is big-endian and provide a more informative error in that case; another reader could also offer to swap the bytes around for the user. This could also be useful for online viewers, which could definitively tell you what type of model it is before you download it.

In general, I think it's a good idea to make things like this as upfront as possible in the reading process so that a reader knows what it's working with without having to intuit it with a heuristic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants