-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Description
I've noticed that the kerchunk.grib2._split_file() function does not work if the file being scanned is smaller than 1024 bytes. The if len(head) < 1024: break # eof line/block is the culprit:
Lines 49 to 64 in a0c4f3b
| while f.tell() < size: | |
| logger.debug(f"extract part {part + 1}") | |
| head = f.read(1024) | |
| if len(head) < 1024: | |
| break # EOF | |
| if b"GRIB" not in head: | |
| f.seek(-4, 1) | |
| continue | |
| ind = head.index(b"GRIB") | |
| start = f.tell() - 1024 + ind | |
| part_size = int.from_bytes(head[ind + 12 : ind + 16], "big") | |
| f.seek(start) | |
| yield start, part_size, f.read(part_size) | |
| part += 1 | |
| if skip and part >= skip: | |
| break |
Is it safe to remove this early break so that files smaller than 1024 bytes can be dealt with? Essentially, the final while block would look like:
while f.tell() < size:
logger.debug(f"extract part {part + 1}")
head = f.read(1024)
# if len(head) < 1024:
# break # EOF
if b"GRIB" not in head:
f.seek(-4, 1)
continue
ind = head.index(b"GRIB")
# start = f.tell() - 1024 + ind
start = f.tell() - len(head) + ind
part_size = int.from_bytes(head[ind + 12 : ind + 16], "big")
f.seek(start)
yield start, part_size, f.read(part_size)
part += 1
if skip and part >= skip:
break It is unclear to me if this would cause problems with other GRIB files than the datasets I've handled myself.
Metadata
Metadata
Assignees
Labels
No labels