-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak bulk indexer #701
base: main
Are you sure you want to change the base?
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
💚 CLA has been signed |
What's the reason for opening/closing indexers? It can last the lifetime of the application, when doing so with a large buffer there are zero allocations here. |
@T-J-L hello, is there any way to use same indexers for different operations? we needed to close after adding to batch per operation. |
I create a single indexer at start up, with a low flush time (100ms). Then for each application request create a couple of channels for success/errors, perform You can set the |
Great solution, but how can you be sure your documents will be written to Elasticsearch? We need to be sure that our documents will be written to Elasticsearch thus we are closing the bulk indexer. |
It would seem odd to me that the intended use of a As @mhmtszr suggested, I will try to test these changes and see if it resolves my issue. I'm not a maintainer of the repo, but hopefully that will help with the acceptance of this PR. |
I can confirm that this change resolves the memory issues I've been seeing: #956. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested these changes in my own custom fork of the repo, since the original clone here is a bit old.
Prior to these changes, memory usage would spike when calling Close()
on multiple instances of BulkIndexer
. My application would spike from ~8mb to ~300mb of usage after just a few invocations.
I saw memory usage remain much more expected after applying these changes to my fork. Memory usage stayed within ~1-2mb of the original bootup usage.
Resolves #956
Edit: Although I'm seeing greatly improved memory usage after this change, I'm starting to wonder if this fix masks an underlying problem with how resource cleanup is handled in BulkIndexer
. The flushBuffer
method is also allocating new buffers in a similar way that the NewBulkIndexer
was prior to this PR change.
I think there's probably more to investigate here.
IMO a long running process is the only reason for using the bulk indexer. If you are closing the indexer immediately to force a flush, it seems like using a normal bulk request would be a better choice. |
Bulk indexer makes a lot of heap allocation, it affect our applications' performance. I tried to reduce allocations by using "sync.pool".
Bulkindexers that we regularly open and close cause allocation.