Update LLM docs #352

jonatanklosko · 2024-02-26T15:23:25Z

I did another iteration of this. Currently running LLaMa 7B with params on the GPU requires 16GiB of memory. Params on the CPU + lazy transfers require 15.12GiB, which is almost negligible and given that it adds latency of like x4 inference time, I think it's no longer worth mentioning. Sidenote: lazy transfers don't really change anything here and that's what I would expect, since generation loops over the model and therefor all params need to be on the GPU. I'm sure how not having params on the GPU makes a difference, since they can't be garbage collected early either, but the difference is very tiny anyway.

Note that for Stable Diffusion params on the CPU + lazy transfers has more impact, because it uses several models, so once one finishes its params can be garbage collected and the next model params can be loaded lazily, so it does make sense.

I also added an example with Mistral.

jonatanklosko added 2 commits February 26, 2024 21:58

Update docs with respect to GPU memory usage

74eb2cb

Add mistral example

33f9f5f

jonatanklosko merged commit 9d84d45 into main Feb 26, 2024
2 checks passed

jonatanklosko deleted the jk-docs branch February 26, 2024 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LLM docs #352

Update LLM docs #352

jonatanklosko commented Feb 26, 2024 •

edited

Loading

Update LLM docs #352

Update LLM docs #352

Conversation

jonatanklosko commented Feb 26, 2024 • edited Loading

jonatanklosko commented Feb 26, 2024 •

edited

Loading