Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal : pad n_ctx by 32 #6177

Merged
merged 2 commits into from Mar 22, 2024
Merged

metal : pad n_ctx by 32 #6177

merged 2 commits into from Mar 22, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Mar 20, 2024

fix #6173

We were padding kv_self.n but not n_ctx, leading to unaligned memory access with Metal

@ggerganov ggerganov changed the title metal : require ne00 >= 128 for mat-mat kernels metal : pad n_ctx by 32 Mar 21, 2024
@ggerganov ggerganov merged commit 95d576b into master Mar 22, 2024
62 of 64 checks passed
@ggerganov ggerganov deleted the gg/metal-fix-mm branch March 22, 2024 07:36
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: llama.cpp produces nonsensical outputs when using batched decoding on Metal
1 participant