Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in Frame::UpdateFloatingPointRegisters on ARM64 #101921

Closed
rzikm opened this issue May 6, 2024 · 6 comments · Fixed by #102258
Closed

Infinite loop in Frame::UpdateFloatingPointRegisters on ARM64 #101921

rzikm opened this issue May 6, 2024 · 6 comments · Fixed by #102258

Comments

@rzikm
Copy link
Member

rzikm commented May 6, 2024

We have observed that some tests started to time out on Windows ARM64 outerloop runs, first run on commit d92ac1f. I assume something committed in the preceeding 24h will be the cause.

Example failure https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-b0ba224c9aa14ac7aa/System.Net.WebHeaderCollection.Tests/1/console.ebb7fdfd.log?helixlogtype=result

========================== End custom configuration settings ===============================
----- start Mon 05/06/2024 10:10:02.06 ===============  To repro directly: =====================================================
pushd C:\h\w\BB260A1D\w\AB250977\e\
"C:\h\w\BB260A1D\p\dotnet.exe" exec --runtimeconfig System.Net.WebHeaderCollection.Tests.runtimeconfig.json --depsfile System.Net.WebHeaderCollection.Tests.deps.json xunit.console.dll System.Net.WebHeaderCollection.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================

C:\h\w\BB260A1D\w\AB250977\e>"C:\h\w\BB260A1D\p\dotnet.exe" exec --runtimeconfig System.Net.WebHeaderCollection.Tests.runtimeconfig.json --depsfile System.Net.WebHeaderCollection.Tests.deps.json xunit.console.dll System.Net.WebHeaderCollection.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Net.WebHeaderCollection.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.WebHeaderCollection.Tests (found 61 test cases)
  Starting:    System.Net.WebHeaderCollection.Tests (parallel test collections = on [2 threads], stop on fail = off)
   System.Net.WebHeaderCollection.Tests: [Long Running Test] 'System.Net.Tests.WebHeaderCollectionTest.HttpResponseHeader_AddInvalid_Throws', Elapsed: 00:02:11
   System.Net.WebHeaderCollection.Tests: [Long Running Test] 'System.Net.Tests.WebHeaderCollectionTest.HttpResponseHeader_AddInvalid_Throws', Elapsed: 00:04:12
   System.Net.WebHeaderCollection.Tests: [Long Running Test] 'System.Net.Tests.WebHeaderCollectionTest.HttpResponseHeader_AddInvalid_Throws', Elapsed: 00:06:12

The test itself just calling a ctor and checks for exception type, there is nothing that can be expected to hang.

[Theory]
[InlineData((HttpResponseHeader)int.MinValue)]
[InlineData((HttpResponseHeader)(-1))]
[InlineData((HttpResponseHeader)int.MaxValue)]
public void HttpResponseHeader_AddInvalid_Throws(HttpResponseHeader header)
{
WebHeaderCollection w = new WebHeaderCollection();
Assert.Throws<IndexOutOfRangeException>(() => w[header] = "foo");
}

There are more examples like this, the common thread is that the test involves throwing an exception. Compiling locally on ARM64 machine and attaching a debugger shows infinite loop in

void Frame::UpdateFloatingPointRegisters(const PREGDISPLAY pRD)
{
_ASSERTE(!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)));
while (!ExecutionManager::IsManagedCode(::GetIP(pRD->pCurrentContext)))
{
#ifdef TARGET_UNIX
PAL_VirtualUnwind(pRD->pCurrentContext, NULL);
#else
Thread::VirtualUnwindCallFrame(pRD);
#endif
}
}

With stacktrace

0:011> !dumpstack
OS Thread Id: 0x6544 (11)
Current frame: coreclr!RangeSectionMap::LookupRangeSection + 0x94 [C:\source\runtime\src\coreclr\vm\codeman.h:1251]
ChildFP          RetAddr          Caller, Callee
0000008EF25F98D0 00007ff9af0b7e48 coreclr!ExecutionManager::IsManagedCodeWorker + 0x28 [C:\source\runtime\src\coreclr\vm\codeman.cpp:4561], calling coreclr!RangeSectionMap::LookupRangeSection [C:\source\runtime\src\coreclr\vm\codeman.h:1239]
0000008EF25F98E0 00007ff9af0b7dfc coreclr!ExecutionManager::IsManagedCodeWithLock + 0x24 [C:\source\runtime\src\coreclr\vm\codeman.cpp:4537], calling coreclr!ExecutionManager::IsManagedCodeWorker [C:\source\runtime\src\coreclr\vm\codeman.cpp:4551]
0000008EF25F98F0 00007ff9af0fba2c coreclr!UnwindAndContinueRethrowHelperAfterCatch + 0x7c, calling coreclr!DispatchManagedException [C:\source\runtime\src\coreclr\vm\exceptionhandling.cpp:5704]
0000008EF25F9910 00007ff9af0b7db8 coreclr!ExecutionManager::IsManagedCode + 0x20 [C:\source\runtime\src\coreclr\vm\codeman.cpp:4518], calling coreclr!ExecutionManager::IsManagedCodeWithLock [C:\source\runtime\src\coreclr\vm\codeman.cpp:4528]
0000008EF25F9940 00007ff9af0b7d74 coreclr!Frame::UpdateFloatingPointRegisters + 0x1c [C:\source\runtime\src\coreclr\vm\frames.cpp:472], calling coreclr!ExecutionManager::IsManagedCode [C:\source\runtime\src\coreclr\vm\codeman.cpp:4508]
0000008EF25F9960 00007ff9af057a68 coreclr!HelperMethodFrame::UpdateRegDisplay + 0x1d8 [C:\source\runtime\src\coreclr\vm\arm64\stubs.cpp:429], calling coreclr!Frame::UpdateFloatingPointRegisters [C:\source\runtime\src\coreclr\vm\frames.cpp:470]
0000008EF25F9970 00007ff9af0598bc coreclr!StackFrameIterator::ProcessIp + 0x1c [C:\source\runtime\src\coreclr\vm\stackwalk.cpp:2872], calling coreclr!EECodeInfo::Init [C:\source\runtime\src\coreclr\vm\jitinterface.cpp:14541]
0000008EF25F9980 00007ff9af05c350 coreclr!StackFrameIterator::NextRaw + 0x4d8 [C:\source\runtime\src\coreclr\vm\stackwalk.cpp:2731], calling coreclr!HelperMethodFrame::UpdateRegDisplay [C:\source\runtime\src\coreclr\vm\arm64\stubs.cpp:416]
0000008EF25F99A0 00007ff9af0b7d20 coreclr!StackFrameIterator::Next + 0x20 [C:\source\runtime\src\coreclr\vm\stackwalk.cpp:1636], calling coreclr!StackFrameIterator::NextRaw [C:\source\runtime\src\coreclr\vm\stackwalk.cpp:2462]
0000008EF25F99F0 00007ff9af0b7280 coreclr!SfiInit + 0x1c0 [C:\source\runtime\src\coreclr\vm\exceptionhandling.cpp:8327], calling coreclr!StackFrameIterator::Next [C:\source\runtime\src\coreclr\vm\stackwalk.cpp:1624]
0000008EF25F9A10 00007ff950830184 (MethodDesc 00007ff94f683ce8 + 0xcc System.Runtime.ExceptionServices.InternalCalls.RhpSfiInit(System.Runtime.StackFrameIterator ByRef, Void*, Boolean, Boolean*))
0000008EF25F9A80 00007ffa95f6a804 ntdll!RtlpFreeHeapInternal + 0x36c, calling ntdll!RtlpHpLfhSubsegmentFreeBlock
0000008EF25F9AF0 00007ffa95f6a45c ntdll!RtlpHpFreeWithExceptionProtection + 0x6c, calling ntdll!RtlpFreeHeapInternal
0000008EF25F9B50 00007ff950830098 (MethodDesc 00007ff94f684080 + 0x40 System.Runtime.StackFrameIterator.Init(PAL_LIMITED_CONTEXT*, Boolean, Boolean*))
0000008EF25F9B90 00007ff950830184 (MethodDesc 00007ff94f683ce8 + 0xcc System.Runtime.ExceptionServices.InternalCalls.RhpSfiInit(System.Runtime.StackFrameIterator ByRef, Void*, Boolean, Boolean*))
0000008EF25F9BA0 00007ffa95f6a34c ntdll!RtlFreeHeap + 0x4c, calling ntdll!RtlpHpFreeWithExceptionProtection
0000008EF25F9BC0 00007ff9af03a81c coreclr!CrstBase::Leave + 0x1c [C:\source\runtime\src\coreclr\vm\crst.cpp:356]
0000008EF25F9C60 00007ff95082f9c4 (MethodDesc 00007ff94f683bc0 + 0x104 System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef))
0000008EF25F9C90 00007ff95082f810 (MethodDesc 00007ff94f683b78 + 0x90 System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef))
0000008EF25F9E20 00007ff9aefe1e04 coreclr!CallDescrWorkerInternal + 0x84 [C:\source\runtime\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\CallDescrWorkerARM64.asm:4989]
0000008EF25F9E50 00007ff9af091e8c coreclr!DispatchCallSimple + 0x8c [C:\source\runtime\src\coreclr\vm\callhelpers.cpp:221], calling coreclr!CallDescrWorkerInternal [C:\source\runtime\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\CallDescrWorkerARM64.asm:4929]
0000008EF25F9E70 00007ff9af151430 coreclr!operator new[] + 0x60 [C:\source\runtime\src\coreclr\utilcode\clrhost_nodependencies.cpp:348]
0000008EF25F9EB0 00007ff9af04cbb8 coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary + 0x238 [C:\source\runtime\src\coreclr\vm\codeversion.cpp:1734], calling coreclr!MethodDesc::JitCompileCode [C:\source\runtime\src\coreclr\vm\prestub.cpp:593]
0000008EF25F9ED0 00007ff9af0fd58c coreclr!ExInfo::ExInfo + 0xbc [C:\source\runtime\src\coreclr\vm\exinfo.cpp:334], calling coreclr!StackTraceInfo::AllocateStackTrace [C:\source\runtime\src\coreclr\vm\excep.cpp:3289]
0000008EF25F9EF0 00007ff9af0fd40c coreclr!DispatchManagedException + 0xcc [C:\source\runtime\src\coreclr\vm\exceptionhandling.cpp:5673], calling coreclr!ExInfo::ExInfo [C:\source\runtime\src\coreclr\vm\exinfo.cpp:331]
0000008EF25F9F10 00007ff9af0fd480 coreclr!DispatchManagedException + 0x140 [C:\source\runtime\src\coreclr\vm\exceptionhandling.cpp:5695], calling coreclr!DispatchCallSimple [C:\source\runtime\src\coreclr\vm\callhelpers.cpp:171]
0000008EF25F9F40 00007ff9af0fd3bc coreclr!DispatchManagedException + 0x7c [C:\source\runtime\src\coreclr\vm\exceptionhandling.cpp:5665], calling coreclr!GetHRFromThrowable [C:\source\runtime\src\coreclr\vm\excep.cpp:2633]
0000008EF25FA180 00007ff9af04e644 coreclr!MethodDesc::JitCompileCodeLocked + 0xac [C:\source\runtime\src\coreclr\vm\prestub.cpp:939], calling coreclr!UnsafeJitFunction [C:\source\runtime\src\coreclr\vm\jitinterface.cpp:12877]
0000008EF25FA1D0 00007ff9af04e6d8 coreclr!MethodDesc::JitCompileCodeLocked + 0x140 [C:\source\runtime\src\coreclr\vm\prestub.cpp:1006], calling coreclr!PrepareCodeConfig::SetNativeCode [C:\source\runtime\src\coreclr\vm\prestub.cpp:1862]
0000008EF25FA1E0 00007ff9af04e504 coreclr!MethodDesc::JitCompileCodeLockedEventWrapper + 0x174 [C:\source\runtime\src\coreclr\vm\prestub.cpp:820], calling coreclr!MethodDesc::JitCompileCodeLocked [C:\source\runtime\src\coreclr\vm\prestub.cpp:928]
0000008EF25FA200 00007ff94feac0fc (MethodDesc 00007ff94f758bb0 + 0x3c System.Runtime.Intrinsics.Vector128.Widen(System.Runtime.Intrinsics.Vector128`1<Byte>))
0000008EF25FA240 00007ff94feac088 (MethodDesc 00007ff94f90ea80 + 0x478 System.Text.Ascii.Widen[[System.Runtime.Intrinsics.Vector128`1[[System.Byte, System.Private.CoreLib]], System.Private.CoreLib],[System.Runtime.Intrinsics.Vector128`1[[System.UInt16, System.Private.CoreLib]], System.Private.CoreLib]](System.Runtime.Intrinsics.Vector128`1<Byte>))
0000008EF25FA280 00007ff94f6df1e8 (MethodDesc 00007ff94f903918 + 0x270 System.Text.Ascii.WidenAsciiToUtf1_Vector[[System.Runtime.Intrinsics.Vector128`1[[System.Byte, System.Private.CoreLib]], System.Private.CoreLib],[System.Runtime.Intrinsics.Vector128`1[[System.UInt16, System.Private.CoreLib]], System.Private.CoreLib]](Byte*, Char*, UIntPtr ByRef, UIntPtr))
0000008EF25FA290 00007ffa95f6a804 ntdll!RtlpFreeHeapInternal + 0x36c, calling ntdll!RtlpHpLfhSubsegmentFreeBlock
0000008EF25FA340 00007ff95085b904 (MethodDesc 00007ff95091de10 + 0x64 System.Net.HttpResponseHeaderExtensions.GetName(System.Net.HttpResponseHeader)), calling 00007ff94f6a0270
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 6, 2024
@rzikm rzikm added arch-arm64 os-windows and removed untriaged New issue has not been triaged by the area owner labels May 6, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@mangod9
Copy link
Member

mangod9 commented May 6, 2024

Hello @rzikm, is there a consistent repro for this? Might be worth trying with DOTNET_LegacyExceptionHandling=1. Fyi @janvorli

@rzikm
Copy link
Member Author

rzikm commented May 6, 2024

seems consistent enough to me, I see a failure in the same tests every 12 hours in Kusto database (which I assume is the outerloop runtime-extra-platforms pipeline frequency on main).

DOTNET_LegacyExceptionHandling=1 gets rid of the issue.

@mangod9
Copy link
Member

mangod9 commented May 6, 2024

Ok that's good to know, so looks related to new exception handling. Assigning to Jan. Should this be marked as affecting outerloop?

@janvorli
Copy link
Member

janvorli commented May 6, 2024

This sounds like an unwind problem. This method is only called with the new EH, so the fact that it doesn't repro with DOTNET_LegacyExceptionHandling=1 is not surprising.

@rzikm
Copy link
Member Author

rzikm commented May 7, 2024

It's actually affecting the runtime-extra-platforms pipeline. I amended the previous comment.

@rzikm rzikm added this to the 9.0.0 milestone May 9, 2024
janvorli added a commit to janvorli/runtime that referenced this issue May 15, 2024
There was an issue with unwinding native code functions in case of calls
to no-return function placed at an end of a function code block. The
return address was not in range of the function code, so
RtlLookupFunctionEntry was not finding anything, we were thinking that
it was a leaf function due to that and tried to unwind using LR only,
which was wrong and resulted in staying on the same instruction. Thus
the unwinding ended up in an infinite loop for those cases.
The fix, that matches what RtlUnwind does, is to adjust the instruction
pointer at call sites back. This is arm64 specific.

Close dotnet#101921
janvorli added a commit to janvorli/runtime that referenced this issue May 15, 2024
There was an issue with unwinding native code functions in case of calls
to no-return function placed at an end of a function code block. The
return address was not in range of the function code, so
RtlLookupFunctionEntry was not finding anything, we were thinking that
it was a leaf function due to that and tried to unwind using LR only,
which was wrong and resulted in staying on the same instruction. Thus
the unwinding ended up in an infinite loop for those cases.
The fix, that matches what RtlUnwind does, is to adjust the instruction
pointer at call sites back. This is arm64 specific.

Close dotnet#101921
janvorli added a commit that referenced this issue May 16, 2024
* Fix Windows Arm64 unwinding

There was an issue with unwinding native code functions in case of calls
to no-return function placed at an end of a function code block. The
return address was not in range of the function code, so
RtlLookupFunctionEntry was not finding anything, we were thinking that
it was a leaf function due to that and tried to unwind using LR only,
which was wrong and resulted in staying on the same instruction. Thus
the unwinding ended up in an infinite loop for those cases.
The fix, that matches what RtlUnwind does, is to adjust the instruction
pointer at call sites back. This is arm64 specific.

Close #101921

* Modify the ifdef from CONTEXT_UNWOUND_TO_CALL to TARGET_ARM64
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this issue May 30, 2024
* Fix Windows Arm64 unwinding

There was an issue with unwinding native code functions in case of calls
to no-return function placed at an end of a function code block. The
return address was not in range of the function code, so
RtlLookupFunctionEntry was not finding anything, we were thinking that
it was a leaf function due to that and tried to unwind using LR only,
which was wrong and resulted in staying on the same instruction. Thus
the unwinding ended up in an infinite loop for those cases.
The fix, that matches what RtlUnwind does, is to adjust the instruction
pointer at call sites back. This is arm64 specific.

Close dotnet#101921

* Modify the ifdef from CONTEXT_UNWOUND_TO_CALL to TARGET_ARM64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants