Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b1521
llama : fix data units (#4101) * llama : fix data units ggml-ci * Revert "llama : fix data units" This reverts commit f5feac831fe225ed7f3db938d115732a49dccfc4. * llama : disambiguate data units ggml-ci
b1520
Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time
b1519
gguf : fix potential infinite loops while parsing (#4100) Co-authored-by: Bernhard Gstrein <[email protected]>
b1518
llama : restore prefix space in llama tokenizer (#4081)
b1517
ggml-cuda : increase max graph size (#4084)
b1516
Fix MacOS Sonoma model quantization (#4052) Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
b1515
stablelm : StableLM support (#3586) * Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers
b1513
llava : fix regression for square images in #3613 (#4056)
b1512
ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060) ggml-ci
b1510
sync : ggml (backend v2) (#3912) * sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp