Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing big xml files #5924

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

asolntsev
Copy link
Contributor

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

When running a big integration test suite that runs LiquiBase multiple times, I realized that LB spends a lot of time on generating the same method names again and again (e.g. given property name name, LB generates Strings isName, getName and setName many-many times).

I suggest an optimization that computes them only once and holds in a Map.
Should make parsing big XML files faster and consume less memory.

I believe this change doesn't affect any functionality (besides the above-mentioned performance improvement).

memoize generated is/set/get method names. These methods are called many-many times, but there is very few unique property names among them.
instead of ineffectively iterating existing methods one-by-one and generating is/get/set method names in the loop multiple times,
hold methods in a Map which allow fast lookup by property name.
@MalloD12
Copy link
Contributor

Hi @asolntsev,

Thank you for creating this improvement PR, it's very appreciated it by the team. I left you a small question/comment I'll be also checking with the team and I'll let you know.

In general, this PR looks good to me. I think I would only ask you if you can add some unit tests to ObjectUtilTest class for this.

Last thing, do you have a comparison from your use case that shows us how this improvement is behaving in numbers (how it's now vs how it was previously)?

Thanks,
Daniel.

@asolntsev
Copy link
Contributor Author

asolntsev commented May 31, 2024

Thank you Daniel!

add some unit tests to ObjectUtilTest class

  1. Initially, I didn't add any tests because I didn't add or change any functionality. The existing tests already cover the functionality of ObjectUtil class.
  2. Now I aded dedicated unit-tests for the new class ObjectMethods.

do you have a comparison from your use case that shows us how this improvement is behaving in numbers

On my project and my laptop (MacBook pro M1), I run in parallel threads a suite of integration tests that run LiquiBase multiple times to initialize DB (DB is local in docker container).

Execution time of integration test decreased by ~1 minute:

  • Before PR 5924: 8m 38s
  • After PR 5924: 7m 48s

@MalloD12 MalloD12 self-assigned this May 31, 2024
Copy link
Contributor

@MalloD12 MalloD12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

Looks good to me. Most of the build and test checks are looking good (except for functional tests that are failing because of a non-related reason, same for test-harness but they should be ok in the next run).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Code Review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants