Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in GC when attached profiler calls ICorProfilerInfo4::EnumThreads using .NET9 runtime #110062

Open
obeton opened this issue Nov 21, 2024 · 1 comment
Labels
area-Diagnostics-coreclr untriaged New issue has not been triaged by the area owner

Comments

@obeton
Copy link

obeton commented Nov 21, 2024

Description

Background: I am member of team responsible for maintaining a closed source .net profiler

Issue: I am currently working on validating .NET9 with profiler and while running regression tests, I've observed process stall when using .NET9 runtime with the profiler attached.

Root Cause: I've managed to narrow down the cause of the issue to an invocation of ICorProfilerInfo4::EnumThreads within our profiler's overridden ICorProfilerCallback::RuntimeSuspendFinished method, which I'll share in the 'actual behavior' form

Reproduction Steps

In profiler dll project have class:
`class ProfilerCallback : public ICorProfilerCallback5
{
public:
virtual COM_METHOD(HRESULT) Initialize(IUnknown* pICorProfilerInfoUnk)
{
return pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo8, (void**)&m_pProfilerInfo);
}

COM_METHOD(HRESULT) RuntimeSuspendFinished()
{
    ICorProfilerThreadEnum* threadEnum = nullptr;
    HRESULT enumThreadsHR = m_pProfilerInfo->EnumThreads(&threadEnum);
    threadEnum->Release();
    return enumThreadsHR;
}
ICorProfilerInfo8* m_pProfilerInfo = nullptr;

};
`

Attach as profiler this (using guide
Run and observe process stall/deadlock after RuntimeSuspendFinished is called

Expected behavior

Process should not deadlock

Actual behavior

Process deadlocks

This is the chain of calls that leads to a deadlock:

  1. Inside RuntimeSuspendFinished, EnumThreads is called
  2. Via StateHolder destructor in ProfilerThreadEnum::Init, ThreadStore::s_pThreadStore->m_HoldingThread is set to NULL
  3. RuntimeSuspendFinished returns
  4. WKS::GCHeap::GarbageCollectGeneration is called which then calls Thread::RareDisablePreemptiveGC
  5. Where there is a call made to ThreadStore::HoldingThreadStore will return true as m_HoldingThread == NULL, which avoids early exit from Thread::RareDisablePreemptiveGC
  6. Later on a call to GCHeapUtilities::GetGCHeap()->WaitUntilGCComplete() is made which will deadlock as GCHeap::SetWaitForGCEvent() hasn't been called yet by ThreadSuspend::RestartEE

Below are the relevant stack traces:

Resetting holding thread:
coreclr.dll!ThreadSuspend::UnlockThreadStore(int bThreadDestroyed, ThreadSuspend::SUSPEND_REASON) Line 1934 C++ [Inline Frame] coreclr.dll!ThreadStore::UnlockThreadStore() Line 5110 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::Release() Line 359 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::{dtor}() Line 340 C++ coreclr.dll!ProfilerThreadEnum::Init() Line 585 C++ coreclr.dll!ProfToEEInterfaceImpl::EnumThreads(ICorProfilerThreadEnum * * ppEnum) Line 10366 C++ CoreRewriter_x64.dll!ProfilerCallback::RuntimeSuspendFinished() Line 2417 C++ coreclr.dll!EEToProfInterfaceImpl::RuntimeSuspendFinished() Line 5056 C++ coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *)>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *) callback, HRESULT * pHR) Line 284 C++ coreclr.dll!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason) Line 5647 C++ coreclr.dll!GCToEEInterface::SuspendEE(SUSPEND_REASON reason) Line 51 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51030 C++

Checking null holding thread:
coreclr.dll!ThreadStore::HoldingThreadStore(Thread * pThread) Line 7624 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2123 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++

Deadlock:
coreclr.dll!WKS::GCHeap::WaitUntilGCComplete(bool bConsiderGCStart) Line 238 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2212 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++

Regression?

We have not encountered this GC deadlock in .NET8 and presumably earlier

Known Workarounds

I haven't tried it yet but there's probably some way to force ThreadStore::s_pThreadStore->m_HoldingThread into it's previous correct state after calling EnumThreads

Configuration

.NET: 9.0.100

Processor 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz Installed RAM 32.0 GB (31.7 GB usable) System type 64-bit operating system, x64-based processor
This has also been replicated on build server. I am not aware of the specs for it.

Other information

this line causes ThreadStore::s_pThreadStore->m_HoldingThread to be set to NULL which has knock-on effects later on which lead to a deadlock.

I have minidumps, but they're pretty large to upload, so let me know if there is any way I can send them

Copy link
Contributor

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Diagnostics-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

1 participant