[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

taeyeonlee · 2024-10-06T05:26:05Z

When running llama v2 7B quantized on QNN HTP backend of Snapdragon-Gen2,
the error is following.

What does it mean ?
"Could not create context from binary for context index = 0 : err 1009"

dm3q:/ $ export LD_LIBRARY_PATH=/data/local/tmp   
dm3q:/data/local/tmp $ ./genie-t2t-run -c htp-model-config-llama2-7b.json -p "<<SYS>>\nYou are a helpful AI assistant.<</SYS>>\n\n[INST] have we been to Mars? [/INST]"
Using libGenie.so version 1.0.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.
1|dm3q:/data/local/tmp $

The Bin files (llama2_0.serialized.bin, llama2_1.serialized.bin, llama2_2.serialized.bin, llama2_3.serialized.bin) were generated with --target-gen snapdragon-gen2, as like below
python gen_ondevice_llama.py --hub-model-id m1q8lpygn,mrmdjx4km,mkngj646n,mknjj0gxn,meq2dy80m,mzmx5gykn,m6qejgw7m,mwn0p5d8m --output-dir ./export --tokenizer-zip-path ./tokenizer.zip --target-gen snapdragon-gen2 --target-os android

The text was updated successfully, but these errors were encountered:

nepro012 · 2024-10-07T05:48:45Z

Hi, while my configuration and error was different from yours, i've succeeded in running llama2-7b in gen2-windows config.
I had to download the latest SDK, 2.27.0.240926 which upped libGenie.so version to 1.1.0. With 1.0.0, I was only able to run gen3-android config.
gen_ondevice_llama.py should supposedly be run again using the latest SDK, but what worked for me was just to replace SDK-related files to the latest ones from $SDK_PATH/bin and $SDK_PATH/lib, and modify json file: in htp-model-config-llama2-7b.json, I had to reduce "spill-fill-bufsize" to 0.
Previously, I randomly had err 1008, 5005, and the binary being just killed silently. I see you're using libGenie.so version 1.0.0 so it's maybe worth a try.

holylong · 2024-10-16T05:36:09Z

Hi, while my configuration and error was different from yours, i've succeeded in running llama2-7b in gen2-windows config. I had to download the latest SDK, 2.27.0.240926 which upped libGenie.so version to 1.1.0. With 1.0.0, I was only able to run gen3-android config. gen_ondevice_llama.py should supposedly be run again using the latest SDK, but what worked for me was just to replace SDK-related files to the latest ones from $SDK_PATH/bin and $SDK_PATH/lib, and modify json file: in htp-model-config-llama2-7b.json, I had to reduce "spill-fill-bufsize" to 0. Previously, I randomly had err 1008, 5005, and the binary being just killed silently. I see you're using libGenie.so version 1.0.0 so it's maybe worth a try.

Does it work for you? Why am I still getting this error after upgrading to the latest version of the SDK and setting 'spill-fill-bufsize' to 0?

LLIKKE · 2024-10-21T11:46:13Z

I had a similar mistake. I have updated the QNN version and the device used is honor phone (8gen3)

bGenie.so version 1.1.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 2 : err 4000"
[ERROR] "Create From Binary FAILED!"
[ERROR] "Failed to free device: 14003"
[ERROR] "Device Free failure"
Failure to initialize model
Failed to create the dialog.

taeyeonlee changed the title ~~[BUG] Fail to run llama v2 7B quantized on Galaxy S23 Ultra~~ [BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra Oct 11, 2024

mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

taeyeonlee commented Oct 6, 2024

nepro012 commented Oct 7, 2024 •

edited

Loading

holylong commented Oct 16, 2024

LLIKKE commented Oct 21, 2024 •

edited

Loading

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra #101

Comments

taeyeonlee commented Oct 6, 2024

nepro012 commented Oct 7, 2024 • edited Loading

holylong commented Oct 16, 2024

LLIKKE commented Oct 21, 2024 • edited Loading

nepro012 commented Oct 7, 2024 •

edited

Loading

LLIKKE commented Oct 21, 2024 •

edited

Loading