Skip to content

potential for further performance improvements #9

Open
@ltalirz

Description

@ltalirz

While pycodcif is the fastest tool tested here https://github.com/ltalirz/cif-parsing-benchmark
there might still be low-hanging fruit for further optimization:

Screenshot 2019-06-10 at 12 16 20

Only about a third of the time is spent in parse_cif and

  1. More time is spent in decode_utf8_frame
  2. Significant time is spent in extract_precision

My questions would be

Re 1.: Without knowing details of what this function does - if it's really about decoding utf8, could this perhaps be done once per file rather than once per every element (e.g. decode_utf8_typed_values is called 1.7M times on the test set)?
Even if not, this function could probably be sped up significantly by moving it to C.

Re 2.: How about making this optional, i.e. adding a flag that allows to disable extracting precision?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions