Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All WGPU demos randomly crash on Apple Silicon #411

Open
Umenokin opened this issue Sep 24, 2023 · 11 comments
Open

All WGPU demos randomly crash on Apple Silicon #411

Umenokin opened this issue Sep 24, 2023 · 11 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Umenokin
Copy link

Hi

Incredible repository which I found just yesterday.

Issue: The app will force Mac to freeze for a while completely blocking all IO until the system will force itself to reboot.

zig build bullet_physics_test_wgpu-run will cause this behavior 100% of the time.
zig build triangle_wgpu-run will cause pretty randomly but still 70-80% of the time.

I haven't been working with low-lever computer graphics for years and it reminds me how it will crash the graphics driver when you try to access the GPU with the wrong address.

@kamidev
Copy link
Contributor

kamidev commented Sep 25, 2023

I am unable to reproduce this on my machine. What are you running?

➜  zig-gamedev git:(main) sw_vers; clang --version; zig version
ProductName:		macOS
ProductVersion:		13.6
BuildVersion:		22G120
Homebrew clang version 17.0.1
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin
0.12.0-dev.594+8fab4f98c

@hazeycode
Copy link
Member

Sounds like it could be out-of-memory. Might be worth monitoring memory usage to see. Can you post your system specs also?

@Umenokin
Copy link
Author

clang --version; zig version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.5.0, M2 MacBook Pro 64gigs
Thread model: posix
zig version: 0.12.0-dev.467+0345d7866

Not sure how to provide more details since it completely freezes my system until it decides to reboot itself.

@michal-z
Copy link
Collaborator

@Umenokin Try release config: zig build -Doptimize=ReleaseFast

@foxnne
Copy link
Contributor

foxnne commented Sep 30, 2023

I'm not fully convinced that this isn't related to that old Dawn bug on macOS. I thought it was fixed but I've since had very strange crashes similar to this, only on macOS. General purpose allocator reports no memory leaks. Really hard for me to predictably recreate.

Also, after several crashes, I checked storage and I had no memory left, which is interesting. Does macOS use the SSD when the system runs out of ram? It took a while for "System Data" to drop back down. I'm not sure if this is what causes the system to freeze/reboot.

Screenshot 2023-09-29 at 10 32 28 PM

@hazeycode
Copy link
Member

IIRC macOS does use a swap file on disk when memory is exhausted.

The GPA will only pick up memory leaks from allocations that went through it. Dawn may well be allocating memory for CPU or GPU that wouldn't be tracked by the GPA.

@kamidev
Copy link
Contributor

kamidev commented Sep 30, 2023

IIRC macOS does use a swap file on disk when memory is exhausted.

The GPA will only pick up memory leaks from allocations that went through it. Dawn may well be allocating memory for CPU or GPU that wouldn't be tracked by the GPA.

Yes, macOS has a swap file. And it may be used even when memory is NOT exhausted. Here is an article that talks a bit about that: https://www.digitaltrends.com/computing/what-is-swap-used-in-mac-activity-monitor/. You can see the current size of the swap file with: "sysctl vm.swapusage". Even on a Mac with ridiculous amounts of RAM, it will probably be there.

@kamidev
Copy link
Contributor

kamidev commented Oct 4, 2023

Bullet_physics_test_wgpu-run now seems to freeze completely for me, too. I just started the app and checked memory with Activity Monitor, noticed a small but noticeable memory leak... but it also said 361 GB of private memory was used. I tried to take a screenshot but my Mac froze and had to be rebooted before I could.

@foxnne
Copy link
Contributor

foxnne commented Oct 4, 2023

I'm fairly certain this has to do with Dawn as it affects both my mach-core and zig-gamedev projects. Similar to what you said I can sometimes observe a small memory leak but the swap file seems to eat all available storage.

@hazeycode
Copy link
Member

hazeycode commented Jan 28, 2024

Newer version of Dawn (#463) gives more validation error reporting. We see important validation errors in our samples. I'm optimistic that resolving these validation errors will solve these ill-effects that we have on Apple Silicon.

@hazeycode hazeycode self-assigned this Jan 28, 2024
@hazeycode hazeycode added this to the 0.6.0 milestone Jan 29, 2024
@hazeycode hazeycode changed the title All WGPU demos randomly crash mac systems All WGPU demos randomly crash on Apple Silicon Feb 10, 2024
@hazeycode
Copy link
Member

Crash report for reference

{"bug_type":"284","timestamp":"2024-03-01 23:41:49.00 +0000","os_version":"macOS 14.3.1 (23D60)","roots_installed":0,"incident_id":"BAB940BE-9852-438D-9DE4-F13EA2B4164C"}
{
  "roots_installed" : 0,
  "bug_type" : "284",
  "process_name" : "physically_based",
  "registers" : {},
  "timestamp" : 1709336509,
  "analysis" : {"iofence_list":{"iofence_num_iosurfaces":1,"iofence_iosurfaces":[{"iofence_current_queue":[{"iofence_acceleratorid":0,"iofence_backtrace":[-2198506784372,-2198506782880,-2198509187096,-2198509177616,-2198509183812,-2198526384164,-2198508960252,-2198508948292],"iofence_direction":1}],"iosurface_id":663,"iofence_waiting_queue":[{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":2},{"iofence_acceleratorid":2,"iofence_backtrace":[-2198841624180,-2198841622688,-2198874093560,-2198874542164,-2198845326180,-2198874398656,-2198874421544,-2198874423052],"iofence_direction":1}]}]},"fw_ta_substate":{"slot0":0,"slot1":0},"fw_power_state":0,"fw_power_boost_controller":9,"guilty_dm":1,"fw_power_controller_in_charge":9,"fw_cl_state":{"slot0":0,"slot1":0,"slot2":0},"fw_perf_state_lo":8,"fw_ta_state":{"slot0":0,"slot1":0},"signature":625,"fw_power_substate":4,"command_buffer_trace_id":470954171,"fw_perf_state_select":0,"restart_reason":7,"fw_3d_state":{"slot0":0,"slot1":0,"slot2":0},"fw_gpc_perf_state":0,"fw_perf_state_hi":8,"fw_power_limit_controller":12,"restart_reason_desc":"blocked by IOFence"}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants