Skip to content

Speed regression with -fa and -ctk #14881

@easyfab

Description

@easyfab

Name and Version

llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
version: 1941 (ce111d3)
built with Ubuntu clang version 18.1.3 (1ubuntu1) for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX 5070ti

Models

No response

Problem description & steps to reproduce

When using some models I have speed regression using -fa and -ctk q8_0

Examples :

llama-bench -m ../models/LGAI-EXAONE_EXAONE-4.0-1.2B-Q8_0.gguf -fa 0,1 -ctk q8_0

model size params backend ngl type_k fa test t/s
exaone4 1.2B Q8_0 1.27 GiB 1.28 B CUDA 99 q8_0 0 pp512 18650.87 ± 98.91
exaone4 1.2B Q8_0 1.27 GiB 1.28 B CUDA 99 q8_0 0 tg128 302.96 ± 0.43
exaone4 1.2B Q8_0 1.27 GiB 1.28 B CUDA 99 q8_0 1 pp512 1039.32 ± 78.99
exaone4 1.2B Q8_0 1.27 GiB 1.28 B CUDA 99 q8_0 1 tg128 111.49 ± 11.46

llama-bench -m ../models/gemma-3n-E4B-it-UD-Q4_K_XL.gguf -fa 0,1 -ctk q8_0

model size params backend ngl type_k fa test t/s
gemma3n E4B Q4_K - Medium 5.01 GiB 6.87 B CUDA 99 q8_0 0 pp512 4810.79 ± 23.46
gemma3n E4B Q4_K - Medium 5.01 GiB 6.87 B CUDA 99 q8_0 0 tg128 99.83 ± 0.77
gemma3n E4B Q4_K - Medium 5.01 GiB 6.87 B CUDA 99 q8_0 1 pp512 1235.63 ± 7.37
gemma3n E4B Q4_K - Medium 5.01 GiB 6.87 B CUDA 99 q8_0 1 tg128 56.76 ± 0.24

Others are not affected :

llama-bench -m ../models/SmolLM3-Q4_K_M.gguf -fa 0,1 -ctk q8_0

model size params backend ngl type_k fa test t/s
smollm3 3B Q4_K - Medium 1.78 GiB 3.08 B CUDA 99 q8_0 0 pp512 11547.97 ± 35.34
smollm3 3B Q4_K - Medium 1.78 GiB 3.08 B CUDA 99 q8_0 0 tg128 234.68 ± 0.22
smollm3 3B Q4_K - Medium 1.78 GiB 3.08 B CUDA 99 q8_0 1 pp512 12815.32 ± 8.10
smollm3 3B Q4_K - Medium 1.78 GiB 3.08 B CUDA 99 q8_0 1 tg128 243.48 ± 0.27

First Bad Commit

No response

Relevant log output

n/a

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions