Skip to content

Simple implementation of Speculative Sampling in NumPy for GPT-2.

Notifications You must be signed in to change notification settings

jaymody/speculative-sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speculative Sampling

A simple implementation of Accelerating Large Language Model Decoding with Speculative Sampling in NumPy for GPT-2. See main.py. I also wrote a blog post for this implementation.

Install Dependencies:

pip install -r picoGPT/requirements.txt

Tested on Python 3.9.10.

Usage:

python main.py \
    --prompt "Alan Turing theorized that computers would one day become" \
    --n_tokens_to_generate 40 \
    --draft_model_size "124M" \
    --target_model_size "1558M" \
    --K 4 \
    --temperature 0 # 0 for greedy sampling

Which outputs:

Autoregressive Decode
---------------------
Time = 60.64s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think like humans.

In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T

Speculative Decode
------------------
Time = 27.15s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think like humans.

In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T

About

Simple implementation of Speculative Sampling in NumPy for GPT-2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages