Skip to content

Downloading LLama2 and Other LLMs Running So Slow #437

@AfamO

Description

@AfamO

Describe the bug
I noticed that downloading and running LLMS such as llama2 and others are very slow. It takes lots of time in my local system to download the model before generating completions. Typically it is faster on Collab with quantization technique.

To Reproduce
Steps to reproduce the behavior:

  1. Write any valid LLM-VM completions generation codes.
  2. Select 'llama2' as 'big_model ' parameter.
  3. Run or execute your code.
  4. See error

Expected behavior
The whole process of downloading LLM models, shards, checkpoints,.. and completions processes should be faster . Perhaps everything should be done in 3-6 minutes. It takes average of 3-4 minutes in Collab.

Screenshots
llama-running-slowly

Desktop (please complete the following information):

  • OS: Windows. Python 3.11.2
  • Version Windows 10 Pro
  • RAM Size: 16GB

Additional context
I am not running the code from Notebook, yet to try it on jupyter notebook. Rather I am running from command prompt that comes with PyCharm IDE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions