### Name and Version version: 5713 (4c9fdfbe) built with clang version 18.1.8 for x86_64-pc-windows-msvc ### Operating systems Windows ### GGML backends CUDA ### Hardware CPU Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz GPU NVIDIA Quadro RTX 5000 with Max-Q Design ### Models bge-m3 ### Problem description & steps to reproduce The embedding results are very different between commit b4712 and b4713. Server command used: ```powershell .\llama-server.exe --hf-repo gpustack/bge-m3-GGUF --hf-file bge-m3-Q4_K_M.gguf --embedding -ngl 99 ``` POST request: ```powershell curl.exe -d "{\"input\": \"Hello\"}" http://127.0.0.1:8080/v1/embeddings ``` Please let me know if this behavior is expected or if there was a change in the embedding logic between these versions. ### First Bad Commit https://github.com/ggml-org/llama.cpp/pull/14217 ### Relevant log output ```shell no relevant log ```