Skip to content

Simplify DerivedSourceReaders lifecycle by removing manual ref-counting#3138

Open
shatejas wants to merge 9 commits intoopensearch-project:mainfrom
shatejas:derived-src-readers-fix
Open

Simplify DerivedSourceReaders lifecycle by removing manual ref-counting#3138
shatejas wants to merge 9 commits intoopensearch-project:mainfrom
shatejas:derived-src-readers-fix

Conversation

@shatejas
Copy link
Collaborator

@shatejas shatejas commented Mar 1, 2026

Description

Reference counting for the underlying readers (StoredFieldsReader in this case) is already handled by Lucene's SegmentReader, so DerivedSourceReaders does not need its own ref-count.

StoredFieldsReader.clone() is called by Lucene on every document read (e.g. get/_source fetch), so the cloned instance must not close the underlying readers — only the owning instance should. This is achieved by delegating close behavior to a Closeable onClose: the public constructor registers a real close action, while clone() and getMergeInstance() produce non-owning instances with a no-op close.

Related Issues

Resolves #OpenSearch#20622

Repro attempt and logs

I modified the code and added some logs to understand when the readers where getting closed. For certain conditions freeReaderContext tries to close the OpensearchReaderManager on a cloned instance of storedfieldsReader. Ideally this should not happen but there seems to a race condition hard to pinpoint on where exactly the ref turns 0 and it tries to close the underlying readers.

The clone implementation is now consistent with Lucene90CompressingStoredFieldsReader.java

[2026-03-01T18:31:55,056][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 3
[2026-03-01T18:31:55,157][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][tryIncRef] current refCount 3
[2026-03-01T18:31:55,163][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 4
[2026-03-01T18:31:55,163][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 3
[2026-03-01T18:31:55,163][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][tryIncRef] current refCount 1
[2026-03-01T18:31:55,163][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 2
[2026-03-01T18:31:55,163][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][tryIncRef] current refCount 1
[2026-03-01T18:31:55,164][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 2
[2026-03-01T18:31:55,164][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][tryIncRef] current refCount 1
[2026-03-01T18:31:55,238][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][decRef] refCount 2
[2026-03-01T18:31:55,238][INFO ][o.o.i.e.OpenSearchReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [OpenSearchReaderManager][tryIncRef] current refCount 1
[2026-03-01T18:31:55,238][INFO ][o.o.i.e.I.ExternalReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [ExternalReaderManger][decRef] refCount 2
[2026-03-01T18:31:55,238][INFO ][o.o.i.e.I.ExternalReaderManager][758a3538f5f3cafd60a93ee69f8ad6f5] [ExternalReaderManger][decRef] refCount 1
[2026-03-01T18:31:55,239][INFO ][o.o.s.SearchService      ][758a3538f5f3cafd60a93ee69f8ad6f5] Attempting to free reader context
[2026-03-01T18:31:55,239][INFO ][o.o.s.i.ReaderContext    ][758a3538f5f3cafd60a93ee69f8ad6f5] [ReaderContext] closing reader context, refCount 1
[2026-03-01T18:31:55,239][WARN ][o.o.k.i.c.d.DerivedSourceReaders][758a3538f5f3cafd60a93ee69f8ad6f5] Attempted to close a cloned DerivedSourceReaders. This is a no-op to avoid closing the original readers

Testing

The change was tested using reindex(calls freeReaderContext underneath) and force merge operations to make sure it works

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

shatejas added 3 commits March 1, 2026 11:14
Reference counting for the underlying readers (StoredFieldsReader in this case)
is already handled by Lucene's SegmentReader, so DerivedSourceReaders does not
need its own AtomicInteger ref-count.

StoredFieldsReader.clone() is called by Lucene on every document read
(e.g. get/_source fetch), so the cloned instance must not close the
underlying readers — only the owning instance should. This is achieved
by delegating close behavior to a Closeable onClose: the public
constructor registers a real close action, while clone() and
getMergeInstance() produce non-owning instances with a no-op close.

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
shatejas and others added 3 commits March 1, 2026 15:31
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.51%. Comparing base (b5b5c60) to head (020ab40).

Files with missing lines Patch % Lines
...ndex/codec/derivedsource/DerivedSourceReaders.java 88.23% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3138      +/-   ##
============================================
+ Coverage     82.42%   82.51%   +0.08%     
- Complexity     3867     3880      +13     
============================================
  Files           418      418              
  Lines         14462    14472      +10     
  Branches       1850     1850              
============================================
+ Hits          11921    11941      +20     
+ Misses         1780     1771       -9     
+ Partials        761      760       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Navneet Verma <navneev@amazon.com>
navneet1v
navneet1v previously approved these changes Mar 12, 2026
Signed-off-by: Tejas Shah <shatejas@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants