Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it have a strong dependence on multi response replies for llms? #6

Open
lollipopwsy opened this issue Mar 3, 2025 · 1 comment

Comments

@lollipopwsy
Copy link

Hello, Authors!
I currently want to use qwen for testing. I changed 'n=5' to 'n=1' in the initialization function of llm_policy.by in order to call qwen, but the effect is very poor. Does it have a strong dependence on multi response replies for llms? Do you know what I need to do to call qwen2.5 properly and have good output? Please help me! Thank you. I look forward to your reply.

@1989Ryan
Copy link
Owner

1989Ryan commented Mar 3, 2025

Hi there,

The initial decision for multiple samples of LLM responses is to have a rough approximation of the distribution over actions, as we do not have access to the log prob of GPT-4 at that time. If you have access to log probs, you might directly compute the distribution over potential actions.

However, please note that this project relies on the model's in-context learning. Even though Qwen is exceptional in many benchmarks, I am concerned about its in-context learning capacity, especially for smaller ones. Downgrades in performance are foreseeable if it's limited to in-context learning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants