Skip to content

Gzip corruption during download/generation #10

@matrix

Description

@matrix

Hi,

during the generation phase, there are often problems with the gzip archives downloaded during the download phase, making it impossible to correctly generate the text file with the hashes.

│ /usr/lib/python3.12/gzip.py:456 in _read_gzip_header                                                                                                                                                               │
│                                                                                                                                                                                                                    │
│   453 │   │   return None                                                                      ╭──────────────────── locals ────────────────────╮                                                                  │
│   454 │                                                                                        │    fp = <_io.BytesIO object at 0x772c372a7d80> │                                                                  │
│   455 │   if magic != b'\037\213':                                                             │ magic = b'<h'                                  │                                                                  │
│ ❱ 456 │   │   raise BadGzipFile('Not a gzipped file (%r)' % magic)                             ╰────────────────────────────────────────────────╯                                                                  │
│   457 │                                                                                                                                                                                                            │
│   458 │   (method, flag, last_mtime) = struct.unpack("<BBIxx", _read_exact(fp, 8))                                                                                                                                 │
│   459 │   if method != 8:                                                                                                                                                                                          │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
BadGzipFile: Not a gzipped file (b'<h')

At the moment, I have had to implement a bash script that checks all downloaded gzip files and, in the event of an error, deletes the corresponding directory and restarts the download phase, but this process is very very slow.

A possible solution could be to check the downloaded gzip files directly during the download phase so that the generation phase can then be carried out without errors, perhaps adding this check as optional via a new parameter.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions