Skip to content
/ cuLLM Public

Large Language Model and Tokenizer written from scratch in CUDA, LLaMA3.1 architecture. Tokenizer written from scratch in C++, GPT-4o tokenizer.

License

Notifications You must be signed in to change notification settings

vpareek2/cuLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuLLM

LLM written from scratch in CUDA, for inference. It follows the architecture presented in the LLaMA 3.1 paper, "The Llama 3 Herd of Models". It is meant to be used with pretrained weights; navigate to main.cu and enter your weights path in line 88. This repository also contains a tokenizer written from scratch in C++, as close to the llama one as possible, using the GPT-4o (o200k_base) regex pattern with bpe.

About

Large Language Model and Tokenizer written from scratch in CUDA, LLaMA3.1 architecture. Tokenizer written from scratch in C++, GPT-4o tokenizer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published