This repository is now obsolete, and replaced with :

Model running :
https://github.com/arrmansa/Basic-UI-for-GPT-Neo-with-low-vram
Benchmarking :
https://github.com/arrmansa/Gpu-bandwidth-benchmark
The new notebooks are significantly (>2x) faster use less vram (3gb instead of 6gb), tunable and contain an actual UI.

Gpt-Neo-Limited-Vram-Cuda

A notebook that runs GPT-Neo with low vram (6 gb) and cuda acceleration by loading it into gpu memory in smaller parts.

Why?

This method may perhaps provide a much more significant acceleration if used with much larger models and higher vram (and ram-vram bandwidth) gpus, making it possible to run large models (similar to gpt-3) on high end consumer hardware.

Notes/Findings

The lack of any significant speed difference between splitting the model into 32 parts (105 second runtime) and 2 parts (85 second runtime) indicates that ram->vram transfer is the major bottleneck for this process. (In version 1.1)
There may be a possibility of transferring blocks to gpu with multiprocessing, since the cpu load also seems single core, and having tensors transferred to gpu serially is not important.
Gpu bandwith tested using debug cell at the end shows a rate of 5.3 GB/s (approx 490gb in 93 seconds) which is significantly lower than the speed of my dual channel 2666Mhz DDR4 ram max theoretical bandwidth (40 GB/s) and also lower than the pci express x16 gen 3 (16 GB/s) bandwidth.
It is possible to exploit the fact that 2 blocks are identical in shape and only have different values inside to make this process faster by avoiding creation of new tensors.
The Pytorch only supporting inplace transfers of modules from gpu to cpu made this more complex than is should be.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitattributes		.gitattributes
Final Limited vram gpt neo 2.7b inference.ipynb		Final Limited vram gpt neo 2.7b inference.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

Final Limited vram gpt neo 2.7b inference.ipynb

Final Limited vram gpt neo 2.7b inference.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

This repository is now obsolete, and replaced with :

Gpt-Neo-Limited-Vram-Cuda

Why?

Notes/Findings

About

Releases 5

Languages

License

arrmansa/Gpt-Neo-Limited-Vram-Cuda

Folders and files

Latest commit

History

Repository files navigation

This repository is now obsolete, and replaced with :

Gpt-Neo-Limited-Vram-Cuda

Why?

Notes/Findings

About

Topics

Resources

License

Stars

Watchers

Forks

Languages