Convert `llama.cpp` to Pytorch

The llama.cpp library is a cornerstone in language modeling with a variety of quantization techniques, but it's largely used within its own ecosystem. This repo's aim is to make these methods more accessible to the PyTorch community.

This repo provides an example for converting GGUF files back into PyTorch state dict, allowing you to run inference purely in PyTorch. Currently supported models:

LLaMA / Mistral
Mixtral
Qwen / Qwen2
InternLM2
StarCoder2
Orion
MiniCPM
Xverse
Command-r-v01
StableLM
Gemma

The code is largely inspired by the original llama.cpp and GPT-Fast.

Getting Started

Install the CUDA extension

python setup.py install

Convert GGUF file to torch state dict

python convert.py --input tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --output TinyLlama-Q4_K_M

Running inference

python generate.py --checkpoint_path TinyLlama-Q4_K_M --interactive --compile

torch.compile will take minutes, you can also run in eager mode without --compile flag.

Todo

Add support to more model
Support partitioned model
Support new MoE breaking change

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
convert.py		convert.py
generate.py		generate.py
gguf_reader.py		gguf_reader.py
gpt2_tokenizer.py		gpt2_tokenizer.py
llamacpp_kernel.cu		llamacpp_kernel.cu
model.py		model.py
py_bind.cpp		py_bind.cpp
register_lib.py		register_lib.py
requirements.txt		requirements.txt
setup.py		setup.py
tp.py		tp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convert `llama.cpp` to Pytorch

Getting Started

Todo

About

Releases

Packages

Languages

License

chu-tianxiang/llama-cpp-torch

Folders and files

Latest commit

History

Repository files navigation

Convert llama.cpp to Pytorch

Getting Started

Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Convert `llama.cpp` to Pytorch

Packages