-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continually pretrained Llama2-7B-hf model inference is not working on 16GB GPU machine #1423
Comments
Sorry, I am not super familiar with HF and this might be more of a question for the HF forum. But in the line |
@rasbt device_map={"cuda": 0} gives same result. I have tried that as well. |
|
As a workaround, would you be able to load the model on CPU using the approach above, save it via |
I tried this approach but the model size reduces from 26GB to 3GB and the results are not as expected. It is returning blank output |
Hi
I am trying to load my continually pretrained Llama-2-7B model on a 16GB GPU machine. Since we cannot load the model directly using AutoModelForCausalLM.from_pretrained, I am using the below approach mentioned in the repo
I am getting inadequate memory erorrs when I try to load it via GPU as well as CPU. I have applied quantization as well which should work with a 16GB machine but the process gets killed abruptly.
PFB the scripts for both
Is there a way to load this model in a 16GB GPU machine with 64GB RAM?
Please suggest a solution
The text was updated successfully, but these errors were encountered: