Skip to content

Releases: ggerganov/llama.cpp

b1521

17 Nov 08:34
4f447a4
Compare
Choose a tag to compare
llama : fix data units (#4101)

* llama : fix data units

ggml-ci

* Revert "llama : fix data units"

This reverts commit f5feac831fe225ed7f3db938d115732a49dccfc4.

* llama : disambiguate data units

ggml-ci

b1520

17 Nov 02:39
91f6499
Compare
Choose a tag to compare
Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)

* gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode.

* Respect add_bos_token GGUF metadata value

* gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time

b1519

16 Nov 16:04
8da4627
Compare
Choose a tag to compare
gguf : fix potential infinite loops while parsing (#4100)

Co-authored-by: Bernhard Gstrein <[email protected]>

b1518

15 Nov 16:56
a6fc554
Compare
Choose a tag to compare
llama : restore prefix space in llama tokenizer (#4081)

b1517

15 Nov 13:22
1cf2850
Compare
Choose a tag to compare
ggml-cuda : increase max graph size (#4084)

b1516

14 Nov 18:16
6bb4908
Compare
Choose a tag to compare
Fix MacOS Sonoma model quantization (#4052)

Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b1515

14 Nov 10:43
36eed0c
Compare
Choose a tag to compare
stablelm : StableLM support (#3586)

* Add support for stablelm-3b-4e1t
* Supports GPU offloading of (n-1) layers

b1513

13 Nov 15:56
bd90eca
Compare
Choose a tag to compare
llava : fix regression for square images in #3613 (#4056)

b1512

13 Nov 15:41
3d68f36
Compare
Choose a tag to compare
ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)

ggml-ci

b1510

13 Nov 12:38
4760e7c
Compare
Choose a tag to compare
sync : ggml (backend v2) (#3912)

* sync : ggml (backend v2) (wip)

* sync : migrate examples and llama.cpp to dynamic graphs (wip)

* sync : update tests + fix max op params to 64

ggml-ci

* sync : ggml-cuda

ggml-ci

* llama : fix save/load state context size

ggml-ci

* sync : try to fix build on tvOS

* sync : pass custom graph sizes in training examples

* sync : update graph copies to new ggml API

* sync : update sync-ggml.sh with new files

* scripts : fix header in sync script

* train : fix context size calculations

* llama : increase inference graph size up to 4096 nodes

* train : allocate grads for backward graphs

* train : allocate grads for gb_tmp