[DocDB] Fix heap-use-after-free in yb::YBThreadPool #28299

Adityakk9031 · 2025-08-17T13:31:03Z

📌 Summary

This PR fixes a heap-use-after-free detected by ASAN in yb::YBThreadPool::Impl::NotifyWorker, where a Worker could be freed while another thread was concurrently accessing it through waiting_workers.Pop().

🐞 Root Cause

Multiple threads could concurrently pop from waiting_workers.

A Worker in IdleStop state could be erased and deleted while another thread was still reading it.

This resulted in nondeterministic crashes under ASAN and caused the following test to fail:

CDCSDKConsumptionConsistentChangesTest.
TestLSNDeterminismWithSpecialRecordOnRestartWithPartialAck

🔧 Fix

Added waiting_workers_active_pops counter in ThreadPoolShare to track active Pop() operations.

Deferred deletion of Worker objects if pops are active, using a new deferred_deletes_ list.

Ensured Shutdown() waits for all pops to finish before freeing deferred workers.

✅ Validation

Ran ybd asan --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test — no more heap-use-after-free crashes.

Verified that normal enqueue/dequeue behavior is unaffected.

Checked that no leaks remain (all deferred deletes are flushed in shutdown).

📊 Impact

Fixes flaky ASAN test failures in DocDB thread pool.

Minimal overhead: adds two atomic ops per worker notification.

No API changes.

🔗 References

Fixes: #28297

Jira: DB-17979

📌 Summary This PR fixes a heap-use-after-free detected by ASAN in yb::YBThreadPool::Impl::NotifyWorker, where a Worker could be freed while another thread was concurrently accessing it through waiting_workers.Pop(). 🐞 Root Cause Multiple threads could concurrently pop from waiting_workers. A Worker in IdleStop state could be erased and deleted while another thread was still reading it. This resulted in nondeterministic crashes under ASAN and caused the following test to fail: CDCSDKConsumptionConsistentChangesTest. TestLSNDeterminismWithSpecialRecordOnRestartWithPartialAck 🔧 Fix Added waiting_workers_active_pops counter in ThreadPoolShare to track active Pop() operations. Deferred deletion of Worker objects if pops are active, using a new deferred_deletes_ list. Ensured Shutdown() waits for all pops to finish before freeing deferred workers. ✅ Validation Ran ybd asan --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test — no more heap-use-after-free crashes. Verified that normal enqueue/dequeue behavior is unaffected. Checked that no leaks remain (all deferred deletes are flushed in shutdown). 📊 Impact Fixes flaky ASAN test failures in DocDB thread pool. Minimal overhead: adds two atomic ops per worker notification. No API changes. 🔗 References Fixes: yugabyte#28297 Jira: DB-17979

CLAassistant · 2025-08-17T13:31:09Z

All committers have signed the CLA.

rthallamko3 requested a review from spolitov October 9, 2025 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DocDB] Fix heap-use-after-free in yb::YBThreadPool #28299

[DocDB] Fix heap-use-after-free in yb::YBThreadPool #28299

Uh oh!

Adityakk9031 commented Aug 17, 2025

Uh oh!

CLAassistant commented Aug 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DocDB] Fix heap-use-after-free in yb::YBThreadPool #28299

Are you sure you want to change the base?

[DocDB] Fix heap-use-after-free in yb::YBThreadPool #28299

Uh oh!

Conversation

Adityakk9031 commented Aug 17, 2025

Uh oh!

CLAassistant commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Aug 17, 2025 •

edited

Loading