-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed LORA fails with index list out of range error when we pass prompts and adapters less than the full batch size #251
Comments
This was referenced Jan 29, 2025
quic-rishinr
pushed a commit
that referenced
this issue
Jan 31, 2025
…v1.19 (#250) Same as [PR#242](#242) This is regarding the issue reported in [issue#251](#251) The finite lorax feature failed to execute when the number of prompts provided is less than the full batch size. The solution involves applying the same adjustment strategy for `prompt_to_lora_id_mapping` as used for `prompt` in the `fix_prompts()` function located in `QEfficient/generation/text_generation_inference.py`. --------- Signed-off-by: Jou-An Chen <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Onkar Chougule <[email protected]>
quic-rishinr
pushed a commit
that referenced
this issue
Feb 19, 2025
This is regarding the issue reported in [issue#251](#251) The finite lorax feature failed to execute when the number of prompts provided is less than the full batch size. The solution involves applying the same adjustment strategy for `prompt_to_lora_id_mapping` as used for `prompt` in the `fix_prompts()` function located in `QEfficient/generation/text_generation_inference.py`. Signed-off-by: Jou-An Chen <[email protected]>
Fix merged in mainline. |
quic-hemagnih
pushed a commit
to quic-hemagnih/efficient-transformers
that referenced
this issue
Mar 12, 2025
This is regarding the issue reported in [issue#251](quic#251) The finite lorax feature failed to execute when the number of prompts provided is less than the full batch size. The solution involves applying the same adjustment strategy for `prompt_to_lora_id_mapping` as used for `prompt` in the `fix_prompts()` function located in `QEfficient/generation/text_generation_inference.py`. Signed-off-by: Jou-An Chen <[email protected]> Signed-off-by: Hem Agnihotri <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Command used:
Error log:
Expected behavior
When we pass prompts and adapters less than the full batch size, the fixed lora inference should either works like base model execution (with prompts duplicated to full batch size) or error out
Screenshots
N/A
Environment (please complete the following information):
Additional context
N/A
The text was updated successfully, but these errors were encountered: