-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Description
The GEMM example program runs successfully with default parameters but fails with a specific shape configuration --m 256 --n 256 --k 131072. The error indicates an AICore execution abnormality with a CCU instruction address check error.
Steps to Reproduce
- Run the default configuration (works fine):
python examples/gemm/example_gemm.py
- Run with problematic shape parameters (fails):
python examples/gemm/example_gemm.py --m 256 --n 256 --k 131072
Behavior
[W1216 15:09:53.857638449 compiler_depend.ts:526] Warning: NPU warning, error code is 507015[Error]:
[Error]: The aicore execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999[PID: 21603] 2025-12-16-15:09:53.480.002 (EZ9999): The error from device(chipId:7, dieId:0), serial number is 12, there is an exception of fftsplus aivector error, core id is 0, error code = 0, dump info: pc start: 0x12c0501af6d0, current: 0x12c0501b0988, vec error info: 0x681ef37be8, mte error info: 0xfdff3b6860, ifu error info: 0x69ffe6c0c3200, ccu error info: 0x80307b5a61000066, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100040080.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:333]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf3b6860, fixp_error1 info: 0xfd, fsmId:0, tslot:7, thread:0, ctxid:0, blk:10, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:353]
Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1555]
AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1191]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1191]
[AIC_INFO] after execute:mixCtx print end[FUNC:GetError][FILE:stream.cc][LINE:1191]
Aicore kernel execute failed, device_id=0, stream_id=2, report_stream_id=2, task_id=1, flip_num=0, fault kernel_name=MatMulV3_ND_ND_ND_ND_FP16_FP16_FP16_FP16_all_10000000000000000030, fault kernel info ext=none, program id=1, hash=15924468081494971399.[FUNC:GetError][FILE:stream.cc][LINE:1191]
rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
(function npuSynchronizeUsedDevices)
[W1216 15:09:53.860529030 compiler_depend.ts:508] Warning: NPU warning, error code is 507015[Error]:
[Error]: The aicore execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999[PID: 21603] 2025-12-16-15:09:53.525.754 (EH9999): wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W1216 15:09:53.863051112 compiler_depend.ts:227] Warning: NPU warning, error code is 507015[Error]:
[Error]: The aicore execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronizeWithTimeout execute failed, reason=[aicore exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999[PID: 21603] 2025-12-16-15:09:53.528.408 (EH9999): wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:162]
TraceBack (most recent call last):
(function empty_cache)
Metadata
Metadata
Assignees
Labels
No labels