cuLLM

LLM written from scratch in CUDA, for inference. It follows the architecture presented in the LLaMA 3.1 paper, "The Llama 3 Herd of Models". It is meant to be used with pretrained weights; navigate to main.cu and enter your weights path in line 88. This repository also contains a tokenizer written from scratch in C++, as close to the llama one as possible, using the GPT-4o (o200k_base) regex pattern with bpe.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
includes		includes
src		src
tests/tokenizer		tests/tokenizer
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.cu		main.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuLLM

About

Releases

Packages

Languages

License

vpareek2/cuLLM

Folders and files

Latest commit

History

Repository files navigation

cuLLM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages