You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, Authors!
I currently want to use qwen for testing. I changed 'n=5' to 'n=1' in the initialization function of llm_policy.by in order to call qwen, but the effect is very poor. Does it have a strong dependence on multi response replies for llms? Do you know what I need to do to call qwen2.5 properly and have good output? Please help me! Thank you. I look forward to your reply.
The text was updated successfully, but these errors were encountered:
The initial decision for multiple samples of LLM responses is to have a rough approximation of the distribution over actions, as we do not have access to the log prob of GPT-4 at that time. If you have access to log probs, you might directly compute the distribution over potential actions.
However, please note that this project relies on the model's in-context learning. Even though Qwen is exceptional in many benchmarks, I am concerned about its in-context learning capacity, especially for smaller ones. Downgrades in performance are foreseeable if it's limited to in-context learning.
Hello, Authors!
I currently want to use qwen for testing. I changed 'n=5' to 'n=1' in the initialization function of llm_policy.by in order to call qwen, but the effect is very poor. Does it have a strong dependence on multi response replies for llms? Do you know what I need to do to call qwen2.5 properly and have good output? Please help me! Thank you. I look forward to your reply.
The text was updated successfully, but these errors were encountered: