LLM written from scratch in CUDA, for inference. It follows the architecture presented in the LLaMA 3.1 paper, "The Llama 3 Herd of Models". It is meant to be used with pretrained weights; navigate to main.cu and enter your weights path in line 88. This repository also contains a tokenizer written from scratch in C++, as close to the llama one as possible, using the GPT-4o (o200k_base) regex pattern with bpe.
-
Notifications
You must be signed in to change notification settings - Fork 0
Large Language Model and Tokenizer written from scratch in CUDA, LLaMA3.1 architecture. Tokenizer written from scratch in C++, GPT-4o tokenizer.
License
vpareek2/cuLLM
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Large Language Model and Tokenizer written from scratch in CUDA, LLaMA3.1 architecture. Tokenizer written from scratch in C++, GPT-4o tokenizer.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published