Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan] Crash when connected to Vulkan 1.3 program #68

Open
JuanDiegoMontoya opened this issue Oct 17, 2024 · 10 comments
Open

[Vulkan] Crash when connected to Vulkan 1.3 program #68

JuanDiegoMontoya opened this issue Oct 17, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@JuanDiegoMontoya
Copy link

Issue

I've built 849c8c5 (I appreciate how easy it is to build) and ran it with the VS debugger attached. Inside GPU Reshape, I began process discovery so it automatically connects to my app.

When I launch my Vulkan 1.3 app, I get the following message from the debug message callback:

Layer VK_LAYER_GPUOPEN_GRS uses API version 1.2 which is older than the application specified API version of 1.3. May cause issues.

When the program continues, it crashes in vkCreateSwapchainKHR:

>	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!std::_Construct_in_place<ResourceState *,ResourceState * const &>(ResourceState * & _Obj, ResourceState * const & <_Args_0>) Line 388	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!std::vector<ResourceState *,ContainerAllocator<ResourceState *>>::_Emplace_back_with_unused_capacity<ResourceState * const &>(ResourceState * const & <_Val_0>) Line 791	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!std::vector<ResourceState *,ContainerAllocator<ResourceState *>>::_Emplace_one_at_back<ResourceState * const &>(ResourceState * const & <_Val_0>) Line 776	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!std::vector<ResourceState *,ContainerAllocator<ResourceState *>>::push_back(ResourceState * const & _Val) Line 867	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!TrackedObject<ResourceState>::AddNoLock(ResourceState * object) Line 100	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!TrackedObject<ResourceState>::Add(ResourceState * object) Line 86	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!CreateSwapchainBufferWrappers(SwapChainState * state, unsigned int count) Line 75	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!CreateSwapChainState<IDXGISwapChain1,DXGI_SWAP_CHAIN_DESC1 const>(const DXGIFactoryTable & table, IDXGIFactory * factory, ID3D12Device * device, IDXGISwapChain1 * swapChain, const DXGI_SWAP_CHAIN_DESC1 * desc) Line 96	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!HookIDXGIFactoryCreateSwapChainForHwnd(IDXGIFactory * factory, IUnknown * pDevice, HWND__ * hWnd, const DXGI_SWAP_CHAIN_DESC1 * pDesc, const DXGI_SWAP_CHAIN_FULLSCREEN_DESC * pFullscreenDesc, IDXGIOutput * pRestrictToOutput, IDXGISwapChain1 * * ppSwapChain) Line 190	C++
 	GRS.Backends.DX12.Layer {E19ED291-B87A-4C8F-9153-D32494C78D3D}.dll!IDXGIFactoryWrapper::CreateSwapChainForHwnd(IUnknown * pDevice, HWND__ * hWnd, const DXGI_SWAP_CHAIN_DESC1 * pDesc, const DXGI_SWAP_CHAIN_FULLSCREEN_DESC * pFullscreenDesc, IDXGIOutput * pRestrictToOutput, IDXGISwapChain1 * * ppSwapChain) Line 2953	C++
 	[External Code]	
 	GRS.Backends.Vulkan.Layer.dll!Hook_vkCreateSwapchainKHR(VkDevice_T * device, const VkSwapchainCreateInfoKHR * pCreateInfo, const VkAllocationCallbacks * pAllocator, VkSwapchainKHR_T * * pSwapchain) Line 50	C++

If needed, I can provide more info since I can reliably produce a full stack trace with debug symbols for both applications, or email a link to a .dmp file I generated.

@miguel-petersen miguel-petersen self-assigned this Oct 17, 2024
@miguel-petersen miguel-petersen added the bug Something isn't working label Oct 17, 2024
@miguel-petersen
Copy link
Collaborator

Hey Juan,

Happy to hear that you had an easy time building it. I'll see to reproducing your crash!

@miguel-petersen
Copy link
Collaborator

Hmm, I am unable to reproduce on an AMD GPU. What are you on?

It seems to fail on the Swapchain the driver creates internally, I have some checks for that based on the return address in the stack, might just need some exclusion.

I am curious though, how is it crashing? Null access?

@JuanDiegoMontoya
Copy link
Author

JuanDiegoMontoya commented Oct 18, 2024

Apologies for forgetting to include basic system info. I'm using a 7900 XTX on Windows 10 with the latest drivers (24.9.1).

The exception that's being generated is

0xC0000005: Access violation writing location 0x00000000FFFFFFFF.

TrackedObject::linear is holding 536'870'911 objects at the time of the crash, which seems rather large.

I'm not sure how to do it privately on GitHub, but I could send a 1 GB dump file if you want to browse it yourself.

Edit: I forgot to mention that I have similar problems with the latest release of GPU Reshape (v0.9.0): a debug message about the layer using API version 1.2, followed by a crash when I call vkQueueSubmit2 (Access violation reading location 0xFFFFFFFFFFFFFFFF.).

@miguel-petersen
Copy link
Collaborator

No worries! I tried reproducing the setup, but for some reason it's not triggering.

If you could send me a dump to [email protected] (I use it for some accounts) that'd be perfect!

@JuanDiegoMontoya
Copy link
Author

I sent an email invite to view the dump on Google Drive. Let me know if you have trouble accessing it.

@miguel-petersen
Copy link
Collaborator

Hey Juan, I got it! 🙂

To resolve your PDBs, could you also send me your GRS.Backends.DX12.Layer.pdb? (Specifically the one you were crashing with)

If you want, you can also use the latest release (Beta2) since that one has the symbols distributed alongside. Happy with either!

@JuanDiegoMontoya
Copy link
Author

JuanDiegoMontoya commented Oct 22, 2024

Sorry about that, it worked on my machine! 😅 I sent a new link to a folder which contains the PDB you mentioned, and serves as a place for me to add further attachments in case you need something else.

Thanks again for looking into this.

@miguel-petersen
Copy link
Collaborator

miguel-petersen commented Oct 22, 2024

Hi Juan,

Everything resolved great now! I have a general idea of what is going wrong, it seems that I'm not wrapping the device that's passed in to CreateSwapchainForHwnd. Could you describe what you're passing into pDevice, and how that particular object is created?

Edit: Apologies, I forgot this is a driver created thing! Not your thing!

@miguel-petersen
Copy link
Collaborator

I put some additional driver return address checks in this branch:
https://github.com/GPUOpen-Tools/GPU-Reshape/tree/issue/68-driver-swapchain

Would it be possible to sync and try it out?

(Small note, if you used discovery, it's a good idea to logout/in or restart, since the bootstrapper DLL has changed.)

@JuanDiegoMontoya
Copy link
Author

JuanDiegoMontoya commented Oct 23, 2024

Hi, I have a couple problems when I try that branch:

  1. I am unable to begin discovery
  • image
  • image
  1. Clicking "Launch application" in the Welcome screen immediately triggers an exception in Formatting.cs
  • image

Having the global hook with global discovery enabled didn't seem to work either.

I also tried the v0.9.0-beta2 prebuilt binary and it suffered from none of these problems. Additionally, I'm able to connect to my app with it!
image

However, I have a few remaining issues:

  • When launched with discovery, none of the shaders, nor pipelines, can be instrumented. The diagnostic given is Shader ## - Parsing failed for every one of them. Could this be due to my compiling my shaders without debug info? I currently cannot enable it due to a bug in glslang.
  • Using "Launch application", GPU Reshape attempts to automatically instrument shaders, which fails for all of them, then after a few moments my application closes. I ran GPU Reshape through a debugger that captured child processes, but the debugger did not break upon my app closing, suggesting the issue is not an exception in my app. I also tried catching and writing exceptions to a file in my app, but that also yielded nothing.

P.S. I noticed in this talk you guys gave that a particular Windows 11 SDK version is required to build GPU Reshape, but I'm on Windows 10 and don't have this SDK. Maybe this is causing some of the issues with the version I built from source?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants