Releases · ggerganov/llama.cpp

17 Nov 08:34

4f447a4

llama : fix data units (#4101)

* llama : fix data units

ggml-ci

* Revert "llama : fix data units"

This reverts commit f5feac831fe225ed7f3db938d115732a49dccfc4.

* llama : disambiguate data units

ggml-ci

Assets 12

17 Nov 02:39

github-actions

b1520

91f6499

b1520

Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)

* gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode.

* Respect add_bos_token GGUF metadata value

* gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time

Assets 12

16 Nov 16:04

github-actions

b1519

8da4627

b1519

gguf : fix potential infinite loops while parsing (#4100)

Co-authored-by: Bernhard Gstrein <[email protected]>

Assets 12

15 Nov 16:56

github-actions

b1518

a6fc554

b1518

llama : restore prefix space in llama tokenizer (#4081)

Assets 12

15 Nov 13:22

github-actions

b1517

1cf2850

b1517

ggml-cuda : increase max graph size (#4084)

Assets 12

14 Nov 18:16

github-actions

b1516

6bb4908

b1516

Fix MacOS Sonoma model quantization (#4052)

Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 12

14 Nov 10:43

github-actions

b1515

36eed0c

b1515

stablelm : StableLM support (#3586)

* Add support for stablelm-3b-4e1t
* Supports GPU offloading of (n-1) layers

Assets 12

13 Nov 15:56

github-actions

b1513

bd90eca

b1513

llava : fix regression for square images in #3613 (#4056)

Assets 12

13 Nov 15:41

github-actions

b1512

3d68f36

b1512

ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060)

ggml-ci

Assets 12

13 Nov 12:38

github-actions

b1510

4760e7c

b1510

sync : ggml (backend v2) (#3912)

* sync : ggml (backend v2) (wip)

* sync : migrate examples and llama.cpp to dynamic graphs (wip)

* sync : update tests + fix max op params to 64

ggml-ci

* sync : ggml-cuda

ggml-ci

* llama : fix save/load state context size

ggml-ci

* sync : try to fix build on tvOS

* sync : pass custom graph sizes in training examples

* sync : update graph copies to new ggml API

* sync : update sync-ggml.sh with new files

* scripts : fix header in sync script

* train : fix context size calculations

* llama : increase inference graph size up to 4096 nodes

* train : allocate grads for backward graphs

* train : allocate grads for gb_tmp

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b1521

b1520

b1519

b1518

b1517

b1516

b1515

b1513

b1512

b1510