Skip to content

GEMM Example Fails with Specific Shape Parameters #140

@wzzll123

Description

@wzzll123

Description

The GEMM example program runs successfully with default parameters but fails with a specific shape configuration --m 256 --n 256 --k 131072. The error indicates an AICore execution abnormality with a CCU instruction address check error.

Steps to Reproduce

  1. Run the default configuration (works fine):
    python examples/gemm/example_gemm.py
  2. Run with problematic shape parameters (fails):
    python examples/gemm/example_gemm.py --m 256 --n 256 --k 131072

Behavior

[W1216 15:09:53.857638449 compiler_depend.ts:526] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999[PID: 21603] 2025-12-16-15:09:53.480.002 (EZ9999):  The error from device(chipId:7, dieId:0), serial number is 12, there is an exception of fftsplus aivector error, core id is 0, error code = 0, dump info: pc start: 0x12c0501af6d0, current: 0x12c0501b0988, vec error info: 0x681ef37be8, mte error info: 0xfdff3b6860, ifu error info: 0x69ffe6c0c3200, ccu error info: 0x80307b5a61000066, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100040080.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:333]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf3b6860, fixp_error1 info: 0xfd, fsmId:0, tslot:7, thread:0, ctxid:0, blk:10, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:353]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1555]
       AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1191]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1191]
       [AIC_INFO] after execute:mixCtx print end[FUNC:GetError][FILE:stream.cc][LINE:1191]
       Aicore kernel execute failed, device_id=0, stream_id=2, report_stream_id=2, task_id=1, flip_num=0, fault kernel_name=MatMulV3_ND_ND_ND_ND_FP16_FP16_FP16_FP16_all_10000000000000000030, fault kernel info ext=none, program id=1, hash=15924468081494971399.[FUNC:GetError][FILE:stream.cc][LINE:1191]
       rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
       wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
 (function npuSynchronizeUsedDevices)
[W1216 15:09:53.860529030 compiler_depend.ts:508] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999[PID: 21603] 2025-12-16-15:09:53.525.754 (EH9999):  wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1216 15:09:53.863051112 compiler_depend.ts:227] Warning: NPU warning, error code is 507015[Error]: 
[Error]: The aicore execution is abnormal. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999[PID: 21603] 2025-12-16-15:09:53.528.408 (EH9999):  wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
        TraceBack (most recent call last):
 (function empty_cache)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions