Resolve memory leaks caused by adding and commiting to postgres (related to #494) #541
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the problems or issues
Is your pull request related to a problem? Please describe.
Memory leaks happen when Fonduer adds and commits data to postgres after parsing many documents.
This causes
OOM killer
of Ubuntu and hangup Fonduer process. Please refer to #494.Does your pull request fix any issue.
One of the memory leak problems is because sqlalchemy keeps all data, which are parsed from documents, on memory untill
commit
fromadd
.Description of the proposed changes
Add
commit
to immediately afteradd
to free memory (L136 ofsrc/fonduer/parser/parser.py
).I’m definitely sure that no side effect happens because no processes exist with the data on the memory after
add
andcommit
.By the way, there are two cases doing
commit
after multipleadd
; one is preparing a rollback of inserting data to postgres, and the other is accelerating insert processes for many small data. In contrast, document data in Fonduer are a few large data.Test plan
Run an existing tests (No additional tests).
Checklist