Skip to content

Support for small files in _split_file #448

@kmpaul

Description

@kmpaul

I've noticed that the kerchunk.grib2._split_file() function does not work if the file being scanned is smaller than 1024 bytes. The if len(head) < 1024: break # eof line/block is the culprit:

while f.tell() < size:
logger.debug(f"extract part {part + 1}")
head = f.read(1024)
if len(head) < 1024:
break # EOF
if b"GRIB" not in head:
f.seek(-4, 1)
continue
ind = head.index(b"GRIB")
start = f.tell() - 1024 + ind
part_size = int.from_bytes(head[ind + 12 : ind + 16], "big")
f.seek(start)
yield start, part_size, f.read(part_size)
part += 1
if skip and part >= skip:
break

Is it safe to remove this early break so that files smaller than 1024 bytes can be dealt with? Essentially, the final while block would look like:

 while f.tell() < size: 
     logger.debug(f"extract part {part + 1}") 
     head = f.read(1024) 
     # if len(head) < 1024: 
     #     break  # EOF 
     if b"GRIB" not in head: 
         f.seek(-4, 1) 
         continue 
     ind = head.index(b"GRIB") 
     # start = f.tell() - 1024 + ind 
     start = f.tell() - len(head) + ind 
     part_size = int.from_bytes(head[ind + 12 : ind + 16], "big") 
     f.seek(start) 
     yield start, part_size, f.read(part_size) 
     part += 1 
     if skip and part >= skip: 
         break 

It is unclear to me if this would cause problems with other GRIB files than the datasets I've handled myself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions