Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

Open
taeyeonlee opened this issue Oct 6, 2024 · 3 comments
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.

Comments

@taeyeonlee
Copy link

When running llama v2 7B quantized on QNN HTP backend of Snapdragon-Gen2,
the error is following.

What does it mean ?
"Could not create context from binary for context index = 0 : err 1009"

dm3q:/ $ export LD_LIBRARY_PATH=/data/local/tmp   
dm3q:/data/local/tmp $ ./genie-t2t-run -c htp-model-config-llama2-7b.json -p "<<SYS>>\nYou are a helpful AI assistant.<</SYS>>\n\n[INST] have we been to Mars? [/INST]"
Using libGenie.so version 1.0.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.
1|dm3q:/data/local/tmp $ 

The Bin files (llama2_0.serialized.bin, llama2_1.serialized.bin, llama2_2.serialized.bin, llama2_3.serialized.bin) were generated with --target-gen snapdragon-gen2, as like below
python gen_ondevice_llama.py --hub-model-id m1q8lpygn,mrmdjx4km,mkngj646n,mknjj0gxn,meq2dy80m,mzmx5gykn,m6qejgw7m,mwn0p5d8m --output-dir ./export --tokenizer-zip-path ./tokenizer.zip --target-gen snapdragon-gen2 --target-os android

@nepro012
Copy link

nepro012 commented Oct 7, 2024

Hi, while my configuration and error was different from yours, i've succeeded in running llama2-7b in gen2-windows config.
I had to download the latest SDK, 2.27.0.240926 which upped libGenie.so version to 1.1.0. With 1.0.0, I was only able to run gen3-android config.
gen_ondevice_llama.py should supposedly be run again using the latest SDK, but what worked for me was just to replace SDK-related files to the latest ones from $SDK_PATH/bin and $SDK_PATH/lib, and modify json file: in htp-model-config-llama2-7b.json, I had to reduce "spill-fill-bufsize" to 0.
Previously, I randomly had err 1008, 5005, and the binary being just killed silently. I see you're using libGenie.so version 1.0.0 so it's maybe worth a try.

@taeyeonlee taeyeonlee changed the title [BUG] Fail to run llama v2 7B quantized on Galaxy S23 Ultra [BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra Oct 11, 2024
@holylong
Copy link

Hi, while my configuration and error was different from yours, i've succeeded in running llama2-7b in gen2-windows config. I had to download the latest SDK, 2.27.0.240926 which upped libGenie.so version to 1.1.0. With 1.0.0, I was only able to run gen3-android config. gen_ondevice_llama.py should supposedly be run again using the latest SDK, but what worked for me was just to replace SDK-related files to the latest ones from $SDK_PATH/bin and $SDK_PATH/lib, and modify json file: in htp-model-config-llama2-7b.json, I had to reduce "spill-fill-bufsize" to 0. Previously, I randomly had err 1008, 5005, and the binary being just killed silently. I see you're using libGenie.so version 1.0.0 so it's maybe worth a try.

Does it work for you? Why am I still getting this error after upgrading to the latest version of the SDK and setting 'spill-fill-bufsize' to 0?
error

@LLIKKE
Copy link

LLIKKE commented Oct 21, 2024

I had a similar mistake. I have updated the QNN version and the device used is honor phone (8gen3)

bGenie.so version 1.1.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 2 : err 4000"
[ERROR] "Create From Binary FAILED!"
[ERROR] "Failed to free device: 14003"
[ERROR] "Device Free failure"
Failure to initialize model
Failed to create the dialog.

@mestrona-3 mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.
Projects
None yet
Development

No branches or pull requests

5 participants