Downloading LLama2 and Other LLMs Running So Slow

**Describe the bug**
I noticed that downloading and running LLMS such as llama2 and others are very slow. It takes lots of time in my local system to download the model before generating completions. Typically it is faster on Collab with quantization technique.

**To Reproduce**
Steps to reproduce the behavior:
1. Write any valid LLM-VM completions generation codes.
2. Select 'llama2' as 'big_model ' parameter.
3. Run or execute your code.
4. See error

**Expected behavior**
The whole process of downloading LLM models, shards, checkpoints,.. and completions processes should be faster . Perhaps everything should be done in 3-6 minutes. It takes average of 3-4 minutes in Collab.

**Screenshots**
![llama-running-slowly](https://github.com/anarchy-ai/LLM-VM/assets/9824566/7ed3388d-3124-4624-b669-1cd3a2fe99dc)


**Desktop (please complete the following information):**
 - OS: Windows. Python 3.11.2
 - Version Windows 10 Pro
 - RAM Size: 16GB

**Additional context**
I am not running the code from Notebook, yet to try it on jupyter notebook. Rather I am running from command prompt that comes with PyCharm IDE.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downloading LLama2 and Other LLMs Running So Slow #437

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downloading LLama2 and Other LLMs Running So Slow #437

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions