Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential for further performance improvements #9

Open
ltalirz opened this issue Jun 10, 2019 · 1 comment
Open

potential for further performance improvements #9

ltalirz opened this issue Jun 10, 2019 · 1 comment

Comments

@ltalirz
Copy link

ltalirz commented Jun 10, 2019

While pycodcif is the fastest tool tested here https://github.com/ltalirz/cif-parsing-benchmark
there might still be low-hanging fruit for further optimization:

Screenshot 2019-06-10 at 12 16 20

Only about a third of the time is spent in parse_cif and

  1. More time is spent in decode_utf8_frame
  2. Significant time is spent in extract_precision

My questions would be

Re 1.: Without knowing details of what this function does - if it's really about decoding utf8, could this perhaps be done once per file rather than once per every element (e.g. decode_utf8_typed_values is called 1.7M times on the test set)?
Even if not, this function could probably be sped up significantly by moving it to C.

Re 2.: How about making this optional, i.e. adding a flag that allows to disable extracting precision?

@merkys
Copy link
Member

merkys commented Nov 19, 2019

Thanks for interesting observation!

Re 1.: Without knowing details of what this function does - if it's really about decoding utf8, could this perhaps be done once per file rather than once per every element (e.g. decode_utf8_typed_values is called 1.7M times on the test set)?
Even if not, this function could probably be sped up significantly by moving it to C.

I am looking for a way to do this in C. I have successfully done so for Perl, and bindings for Perl and Python are very similar.

Re 2.: How about making this optional, i.e. adding a flag that allows to disable extracting precision?

This also is planned to be done in C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants