Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak bulk indexer #701

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mhmtszr
Copy link

@mhmtszr mhmtszr commented Jul 19, 2023

Bulk indexer makes a lot of heap allocation, it affect our applications' performance. I tried to reduce allocations by using "sync.pool".

Bulkindexers that we regularly open and close cause allocation.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@cla-checker-service
Copy link

cla-checker-service bot commented Jul 19, 2023

💚 CLA has been signed

@T-J-L
Copy link

T-J-L commented Oct 12, 2023

What's the reason for opening/closing indexers? It can last the lifetime of the application, when doing so with a large buffer there are zero allocations here.

@mhmtszr
Copy link
Author

mhmtszr commented Dec 12, 2023

@T-J-L hello, is there any way to use same indexers for different operations? we needed to close after adding to batch per operation.

@T-J-L
Copy link

T-J-L commented Dec 14, 2023

@T-J-L hello, is there any way to use same indexers for different operations? we needed to close after adding to batch per operation.

I create a single indexer at start up, with a low flush time (100ms). Then for each application request create a couple of channels for success/errors, perform BulkIndexer.Add then write back to the channels in the BulkIndexerItem.OnSuccess and BulkIndexerItem.OnFailure callbacks. So effectivly each request is syncronous, with all requests to ES are batched.

You can set the Index and Action per item, so this works for all types of operation.

@mhmtszr
Copy link
Author

mhmtszr commented Dec 15, 2023

@T-J-L hello, is there any way to use same indexers for different operations? we needed to close after adding to batch per operation.

I create a single indexer at start up, with a low flush time (100ms). Then for each application request create a couple of channels for success/errors, perform BulkIndexer.Add then write back to the channels in the BulkIndexerItem.OnSuccess and BulkIndexerItem.OnFailure callbacks. So effectivly each request is syncronous, with all requests to ES are batched.

You can set the Index and Action per item, so this works for all types of operation.

Great solution, but how can you be sure your documents will be written to Elasticsearch? We need to be sure that our documents will be written to Elasticsearch thus we are closing the bulk indexer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants