Skip to content

Handle MT issues in setVertexArrivals, fix non-determ crash#396

Closed
dsengupta0628 wants to merge 250 commits into
parallaxsw:masterfrom
dsengupta0628:fix_nd_threading_issue
Closed

Handle MT issues in setVertexArrivals, fix non-determ crash#396
dsengupta0628 wants to merge 250 commits into
parallaxsw:masterfrom
dsengupta0628:fix_nd_threading_issue

Conversation

@dsengupta0628
Copy link
Copy Markdown
Contributor

Issue 1
Rapidus hercules_idecode hits non-deterministic STA assertion · Issue #1537 · The-OpenROAD-Project-private/OpenROAD-flow-scripts
Non-deterministic segmentation fault in GRT

sta::Path::vertexPath
sta::ArrivalVisitor::pruneCrprArrivals
sta::ArrivalVisitor::visit
sta::DispatchQueue::dispatch_thread_handler

Issue 2
Rapidus hercules_is_int with M4D1 parasitics hits STA assertion · Issue #1526 · The-OpenROAD-Project-private/OpenROAD-flow-scripts
Non-deterministic STA assert in CTS (most-shallow stack same as above)

sta::CheckCrpr::findCrpr
sta::Latches::latchBorrowInfo
sta::Latches::latchRequired
sta::Latches::latchOutArrival
sta::PathVisitor::visitFromPath
sta::PathVisitor::visitEdge
sta::PathVisitor::visitFaninPaths
sta::ArrivalVisitor::visit

Both happen because setVertexArrivals() unconditionally mutates vertex->paths_ with no protection, while the multi-threaded BFS allows other threads to simultaneously read another vertex's path array through the CRPR clock path lookup (crprClkPath() → Path::vertexPath()). The two crashes represent the same data race manifesting at different points in the call:

Crash in Issue 1 fails immediately in vertexPath,
Crash in Issue 2 uses a transiently valid pointer that becomes dangling before findCrpr dereferences it, producing garbage that overflows the genclk_src_paths_ vector index.

To fix this, we defer deletion of old vertex path arrays until after visitParallel completes.
So in setVertexArrivals, instead of immediately deleting the old path array (which frees memory that concurrent CRPR/latch readers may still hold pointers to), save it to a per-search list and free it in deleteTagsPrev() — which is called only after all threads for a level have finished.

Why this fixes both crashes ?

Issue 1 (Path::vertexPath ← pruneCrprArrivals):
Thread B was freeing Vclk->paths_ via deletePaths()/makePaths() while Thread A read Vclk->paths_[path_index]. The old array is now kept alive until deleteTagsPrev(), so Thread A's pointer is always
valid within the parallel level.

Issue 2 ('_n < size()' ← findCrpr ← latchBorrowInfo):
Thread A obtained a const Path *src_clk_path via crprClkPath() pointing into Vclk->paths_. Thread B freed that array. Dereferencing the dangling pointer produced garbage rf/min_max indices, which overflowed the genclk_src_paths_ vector. With deferred deletion the returned pointer stays valid, so findCrpr reads correct data.

jjcherry56 and others added 30 commits February 10, 2023 11:52
Signed-off-by: James Cherry <cherry@parallaxsw.com>
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
fix: Rename variable to avoid collision with C++ 20 keyword (requires)
setTopInstance/deleteTopInstance back to being public
…eyword

Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Latest OpenSTA code (fixes hang in flow tests)
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Small update to power numbers in one test
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Signed-off-by: Harsh Vardhan <openroad@chez-vardhan.com>
Latest OpenSTA to fixes test issues
Latest STA code (performance fix)
Latest OpenSTA (fixed crash with thru exceptions).
maliberty and others added 27 commits February 18, 2026 16:49
Signed-off-by: Martin Povišer <povik@cutebit.org>
Signed-off-by: Martin Povišer <povik@cutebit.org>
Signed-off-by: Martin Povišer <povik@cutebit.org>
This was observed to cause crashes in write_timing_model.

Signed-off-by: Martin Povišer <povik@cutebit.org>
…ull-rel_3.0

Pull 3.0 with multimode from parallaxsw
The tcl version in the bazel central registry supports MacOS and
it can be used via MODULES.bazel instead of WORKSPACE.

Signed-off-by: Friedrich Beckmann <friedrich.beckmann@tha.de>
On MacOS the MachineApple.cc must be used instead of
MachineLinux.cc. I added a select statement.

Signed-off-by: Friedrich Beckmann <friedrich.beckmann@tha.de>
Context PR:
The-OpenROAD-Project/OpenROAD#9536

Signed-off-by: Henner Zeller <h.zeller@acm.org>
Before the multi-mode refactor the sort helpers wrapped stable_sort,
switch to stable_sort again as the sorting influences which vertex is
returned from Sta::worstSlack().

Signed-off-by: Martin Povišer <povik@cutebit.org>
Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
…bison

Use bison/flex starlark from //bazel package.
bazel/tcl: migrate from rules_hdl tcl to bazel BCR tcl version
bazel/macos: Changed bazel BUILD file to have MachineApple.cc for Macos build
Use same location for StaConfig.hh output as in cmake.
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
… my fix

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
…ta_write_verilog_fix

Fix write_verilog escape seq Issue 3826
Removed introductory section saying this is a fork and to file upstream.

Signed-off-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ maliberty
✅ dsengupta0628
✅ povik
✅ vvbandeira
✅ hzeller
❌ fredowski
You have signed the CLA already but the status is still pending? Let us recheck it.

@dsengupta0628 dsengupta0628 deleted the fix_nd_threading_issue branch March 3, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.