GGUF endianness cannot be determined from GGUF itself #3957

philpax · 2023-11-05T14:00:47Z

As of the time of writing, the big-endian support that was added in #3552 doesn't encode the endianness within the file itself:

llama.cpp/gguf-py/gguf/gguf.py

Lines 689 to 698 in 3d48f42

 endianess_str = "Big Endian" if self.endianess == GGUFEndian.BIG else "Little Endian" 

 print(f"This gguf file is for {endianess_str} only") 

 def write_header_to_file(self): 

 self.fout.write(struct.pack("<I", GGUF_MAGIC)) 

 self.fout.write(struct.pack(f"{self.pack_prefix}I", GGUF_VERSION)) 

 self.fout.write(struct.pack(f"{self.pack_prefix}Q", self.ti_data_count)) 

 self.fout.write(struct.pack(f"{self.pack_prefix}Q", self.kv_data_count)) 

 self.flush() 

 # print("tensors " + str(self.ti_data_count) + " kv " + str(self.kv_data_count))

This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅

The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.

Given that, my suggestion would be to use FUGG as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).

The text was updated successfully, but these errors were encountered:

abrander · 2023-11-06T18:20:38Z

This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅

Well. The future is now 😄

The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.

I suggest storing the endianness as a single byte that can be read without ambiguity on both little- and big-endian hosts.

Reading a 16-, 32- or 64-bit number is often confusing when you don't know the endianness of the file itself. It quickly leads to very confusing code where you must first guess the host byte order, whether you're reading the file using native byte order, little-endian, or big-endian byte order, and somehow combine those using pure magic and #ifdef guards.

We could use a single int8 (The ASCII value L or B for easy detection by humans in a hex viewer) to distinguish the file byte order.

It will be a breaking change, but failure because of hidden endian issues is also a breaking change.

Given that, my suggestion would be to use FUGG as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).

The only argument I see for making the file magic endian dependable would be to make incompatible readers bail fast - but hey, if they ignore the version field, all bets are off anyway.

KerfuffleV2 · 2023-11-06T20:39:29Z

Hmm, does anything actually need to change other than just storing the version field in the respective endianness? Until there's GGUF version 20,000 or something, every big-endian version is going to be invalid for little endian and every little endian version is going to be invalid for big endian. (Except 0 which is the same for both, but we're already past that.)

edit: If you don't like the idea of just relying on the version being invalid if the file was created for the wrong endian, an alternative (also backward compatible except for existing BE files) is to say the version field is still 4 bytes but the value is a uint16 now. LE can use the first two bytes - uint32 value 4 is \x04\x00\x00\x00 and uint16 value 4 is \x04\x00 so no change is needed. BE can use the second two bytes. Assuming existing BE files wrote the version in BE format, then that also will already be correct since BE 4 is \x00\x00\x00\x04 and BE uint16 [0,4] would be the same.

philpax · 2023-11-08T01:55:07Z

Hm - yes, you're both right, I hadn't noticed the version was written with the correct endianness. This is then not as much of an issue as it seemed - big-endian files that end up out there are unlikely to be loadable by existing executors if they're correctly version-checking.

That being said, I'd still prefer to make it explicit instead of having it be an implicit property of the version number. I can't imagine it'll ever reach a version number in which there's a realistic possibility of confusion, but it's easier for readers if they can determine it directly without having to check if they support the endian-swapped version number.

edit: If you don't like the idea of just relying on the version being invalid if the file was created for the wrong endian, an alternative (also backward compatible except for existing BE files) is to say the version field is still 4 bytes but the value is a uint16 now. LE can use the first two bytes - uint32 value 4 is \x04\x00\x00\x00 and uint16 value 4 is \x04\x00 so no change is needed. BE can use the second two bytes. Assuming existing BE files wrote the version in BE format, then that also will already be correct since BE 4 is \x00\x00\x00\x04 and BE uint16 [0,4] would be the same.

Interesting and clever idea. This could work and wouldn't require a strictly-breaking change to the format.

The immediate concerns I have are that:

no matter what you do, you'll be wasting two bytes. This really doesn't matter given the models are gigabytes in size, but figured I'd mention it still.
the writers and readers need to be aware that they need to check the last two bytes instead of the first two if running on BE.

This isn't a deal-breaker, but it seems pretty easy to forget to do this. To be fair, most people aren't going to be developing against big-endian anyway - perhaps it's fine if it just breaks there.

Ultimately I don't think either of these really matter, it's just not quite as elegant as a more dedicated detection mechanism. I'd be fine with this if we wanted to maintain strict backwards compatibility.

With that being said, a cursory search indicates that nobody's uploaded big-endian models to HF yet, so I'm pretty comfortable with breaking them specifically as long as LE's fine. I can think of two potential solutions here:

Change the magic number based on endianness (original proposal). This makes it very obvious, but may confuse people looking at the format in the wild if they've never seen the big-endian variant before.
Encode the endianness in the version number in a way that's freely compatible with LE and easy to implement for BE. The easy solution is to make it a right-aligned uint16 and use the first two bytes as a boolean (first byte is a big-endian-active bool, second byte is unused).

If we want to be less wasteful and allow for going past 65,536, we can encode it in the MSB of the first byte, but this requires an annoying amount of masking for what seems like a very unlikely outcome.

I'm leaning towards the right-aligned uint16 solution because it maintains LE compatibility, is still obviously a GGUF file, and should work for us until sometime in the next century.

I could be convinced to use one of the three other solutions (magic number solution, using only one bit, or BE-backwards-compatible moving uint16), but this seems like the best set of tradeoffs.

KerfuffleV2 · 2023-11-09T15:43:11Z

If anyone wants to test with non-native endian files, I added a conversion script in #3981 - you can just do something like:

python gguf-py/scripts/gguf-convert-endian.py blah.gguf big

Warning: This does not produce a new file, it modifies the input.

It currently can only convert F32, F16 and Q8_0 tensors (actually supporting the rest of the quantizations wouldn't be very hard).

KerfuffleV2 · 2023-11-09T22:22:39Z

@philpax

If we want to be less wasteful and allow for going past 65,536

I don't mind leaving the problem of dealing with GGUF version 65536 to the people in the year 4,000.

My solution was actually a little overcomplicated though. You can just read the field as normal then bitwise AND with 65535 and you'll get 0 if it's the wrong endian.

Please see ggerganov/llama.cpp#3957

abrander · 2023-11-10T00:49:06Z

For anyone interested, I have implemented v3 byte order detection - without needing to know the host byte order - using the last byte of the version field as a big-endian marker.

It works as intended. It's a bit ugly, but it works.

KerfuffleV2 · 2023-11-10T00:56:05Z

It works as intended. It's a bit ugly, but it works.

Until version 256. :) That still may never happen in our lifetimes, but still. I think & 65535 is a bit more reliable. Then if it's 0, you can byte swap and test if it's a valid version. (The approach I took.)

philpax · 2023-11-12T16:53:41Z

Yeah I think I'd be fine with & 65536. That seems pretty straightforward and I will be impressed if we manage to get anywhere near that in our lifetimes.

KerfuffleV2 · 2023-11-12T17:09:25Z

Yeah I think I'd be fine with & 65536.

I don't think you will be. :) It's gotta be 65535 here unless you want 0 and 65536 as the only two possible values.

philpax · 2023-11-12T17:11:01Z

Yes, of course - my bad 😅

julien-c · 2024-03-12T19:17:31Z

Thanks @ggerganov for pointing me to this nice convo, it should enable us to support visualization of big-endian gguf files in HF:
see huggingface/huggingface.js#540 (and the small fix huggingface/huggingface.js#545) for context! 🔥

@ggerganov

The important snippet is: ```ts const [littleEndian, version] = (() => { /// ggerganov/llama.cpp#3957 /// Assume this code is always running on little-endian /// but wants to be able to parse both endianness const version = r.view.getUint32(4, true); if (version & 65535) { return [true, version]; } else { return [false, r.view.getUint32(4, false)]; } })(); ``` from ggerganov/llama.cpp#3957 and thanks to @ggerganov [comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)

github-actions · 2024-04-27T01:07:01Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

philpax · 2024-05-08T14:34:56Z

Just noticed this was marked as stale - as far as I know, this is still an issue, the only indicator that a GGUF is big-endian is if its version is in big-endian, which is not a strong signal. Given that big-endian models are the minority by far, I'd continue to suggest reversing the magic number to make it explicit, and forcing a re-encode of the BE models.

ggerganov · 2024-05-08T14:38:46Z

and forcing a re-encode of the BE models.

Re-encode "GGUF" -> "FUGG", is this the proposal?

philpax · 2024-05-08T22:23:13Z

Yes, pretty much. I think that should be sufficient?

julien-c · 2024-05-09T08:44:41Z

the only indicator that a GGUF is big-endian is if its version is in big-endian, which is not a strong signal

Seems to work well enough in existing implementations, though (just my 2 cents)

philpax · 2024-05-11T13:25:37Z

It works as a weak signal ("this file can't be loaded"), but you have to use a heuristic to determine if it's a big-endian file ("could I support this if the version was endian-swapped?").

A reader could determine if a file is big-endian and provide a more informative error in that case; another reader could also offer to swap the bytes around for the user. This could also be useful for online viewers, which could definitively tell you what type of model it is before you download it.

In general, I think it's a good idea to make things like this as upfront as possible in the reading process so that a reader knows what it's working with without having to intuit it with a heuristic.

philpax added the bug-unconfirmed label Nov 5, 2023

abrander added a commit to abrander/gguf that referenced this issue Nov 10, 2023

First stab at v3 support.

3ced953

Please see ggerganov/llama.cpp#3957

ggerganov mentioned this issue Mar 12, 2024

a GGUF parser that works on remotely hosted files (over HTTP range requests) huggingface/huggingface.js#540

Merged

julien-c mentioned this issue Mar 12, 2024

GGUF parser: support big-endian files huggingface/huggingface.js#545

Merged

github-actions bot added the stale label Apr 12, 2024

github-actions bot closed this as completed Apr 27, 2024

ggerganov added enhancement New feature or request good first issue Good for newcomers breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. and removed bug-unconfirmed stale labels May 9, 2024

ggerganov reopened this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF endianness cannot be determined from GGUF itself #3957

GGUF endianness cannot be determined from GGUF itself #3957

philpax commented Nov 5, 2023 •

edited

abrander commented Nov 6, 2023

KerfuffleV2 commented Nov 6, 2023 •

edited

philpax commented Nov 8, 2023

KerfuffleV2 commented Nov 9, 2023 •

edited

KerfuffleV2 commented Nov 9, 2023

abrander commented Nov 10, 2023

KerfuffleV2 commented Nov 10, 2023

philpax commented Nov 12, 2023

KerfuffleV2 commented Nov 12, 2023

philpax commented Nov 12, 2023

julien-c commented Mar 12, 2024

github-actions bot commented Apr 27, 2024

philpax commented May 8, 2024

ggerganov commented May 8, 2024

philpax commented May 8, 2024

julien-c commented May 9, 2024

philpax commented May 11, 2024

GGUF endianness cannot be determined from GGUF itself #3957

GGUF endianness cannot be determined from GGUF itself #3957

Comments

philpax commented Nov 5, 2023 • edited

abrander commented Nov 6, 2023

KerfuffleV2 commented Nov 6, 2023 • edited

philpax commented Nov 8, 2023

KerfuffleV2 commented Nov 9, 2023 • edited

KerfuffleV2 commented Nov 9, 2023

abrander commented Nov 10, 2023

KerfuffleV2 commented Nov 10, 2023

philpax commented Nov 12, 2023

KerfuffleV2 commented Nov 12, 2023

philpax commented Nov 12, 2023

julien-c commented Mar 12, 2024

github-actions bot commented Apr 27, 2024

philpax commented May 8, 2024

ggerganov commented May 8, 2024

philpax commented May 8, 2024

julien-c commented May 9, 2024

philpax commented May 11, 2024

philpax commented Nov 5, 2023 •

edited

KerfuffleV2 commented Nov 6, 2023 •

edited

KerfuffleV2 commented Nov 9, 2023 •

edited