Skip to content

macmon causes MLX Metal GPU crash during inference — recommend mactop instead #2088

@PixelMyoos

Description

@PixelMyoos

Bug Report: macmon GPU monitoring triggers SIGABRT in libmlx.dylib during active inference

Environment

  • Hardware: Mac17,6 (MacBook Pro M5, 128GB)
  • OS: macOS 26.4.1 (25E253)
  • Python: 3.12.13 (Homebrew)
  • Stack: Exo (ai.hermes.gateway-ss coalition), oMLX paged inference, libmlx.dylib
  • Triggered by: Running macmon (the system monitor recommended in Exo's README) while MLX inference is active

What Happened

Launching macmon while Exo/oMLX is serving inference causes a hard crash in Python (PID 39036) with SIGABRT / Abort trap: 6.

Crash thread (Thread 52 — com.Metal.CompletionQueueDispatch):

mlx::core::gpu::check_error(MTLCommandBuffer)
  → cxa_throw  (C++ exception thrown inside Metal async dispatch boundary)
  → std::terminate()  (exception cannot propagate across dispatch queue)
  → abort()
  → SIGABRT

Full stack:

0  libsystem_kernel.dylib   pthread_kill
1  libsystem_pthread.dylib  pthread_kill
2  libsystem_c.dylib        abort
3  libc++abi.dylib          abort_message
4  libc++abi.dylib          demangling_terminate_handler
5  libobjc.A.dylib          objc_terminate
6  libc++abi.dylib          std::terminate()
7  libc++abi.dylib          __cxa_throw
8  libmlx.dylib             mlx::core::gpu::check_error(MTLCommandBuffer*)
9  libmlx.dylib             (Metal CompletionHandler block)
10 Metal                    -[MTLCommandBuffer didCompleteWith...]
11 IOGPU                    IOGPUNotificationQueueDispatchAvailableCompletionNotifications

Root Cause

macmon reads Apple Silicon performance counters and GPU utilization metrics via IOKit/IOGPUFamily — the same interfaces Metal uses internally. When macmon samples the GPU concurrently with an in-flight Metal command buffer (during MLX inference), the GPU returns an error state that mlx::core::gpu::check_error detects and attempts to throw as a C++ exception. Because this happens inside a GCD async dispatch block (com.Metal.CompletionQueueDispatch), the exception cannot propagate, triggering std::terminate()abort().

This is not a user error — Exo's own documentation and README point users toward macmon as the recommended Apple Silicon monitoring tool. Users following those instructions will hit this crash reliably on M3/M4/M5 systems running active inference.


Reproduction Steps

  1. Start Exo with MLX inference engine on Apple Silicon (M3/M4/M5)
  2. Load a model and begin inference (active Metal command buffers in flight)
  3. Launch macmon in a separate terminal
  4. Observe Python crash with SIGABRTAbort trap: 6

Recommendation: Replace macmon with mactop

mactop is a drop-in alternative that:

  • Uses the same sysinfo / powermetrics data sources as Activity Monitor
  • Does not directly query IOGPUFamily in a way that interferes with active Metal sessions
  • Has been used extensively on the same M5 hardware alongside active MLX inference with zero crashes

Suggested README change:

macmonmactop for real-time Apple Silicon GPU/ANE/CPU monitoring during Exo inference

Install:

brew install mactop
sudo mactop   # requires sudo for power metrics

Impact

  • Any user following Exo's documented toolchain on Apple Silicon who uses macmon for monitoring while running inference is at risk of hard Python crashes mid-session
  • Crash is non-recoverable (process terminates, active inference context lost)
  • No warning or graceful error — just Abort trap: 6

Crash Report Excerpt (Thread 52 — crashing thread)

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6
Triggered by Thread: 52, Dispatch Queue: com.Metal.CompletionQueueDispatch

Thread 52 Crashed:
0   libsystem_kernel.dylib   pthread_kill + 8
1   libsystem_pthread.dylib  pthread_kill + 296
2   libsystem_c.dylib        abort + 148
3   libc++abi.dylib          abort_message + 132
4   libc++abi.dylib          demangling_terminate_handler + 272
5   libobjc.A.dylib          objc_terminate + 172
6   libc++abi.dylib          std::terminate() + 16
7   libc++abi.dylib          __cxa_throw + 92
8   libmlx.dylib             mlx::core::gpu::check_error(MTLCommandBuffer*) + 244
9   libmlx.dylib             (Metal completion handler block)

Coalition: ai.hermes.gateway-ss | Hardware: Mac17,6 | macOS 26.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions