Does it have a strong dependence on multi response replies for llms? #6

lollipopwsy · 2025-03-03T07:09:17Z

Hello, Authors!
I currently want to use qwen for testing. I changed 'n=5' to 'n=1' in the initialization function of llm_policy.by in order to call qwen, but the effect is very poor. Does it have a strong dependence on multi response replies for llms? Do you know what I need to do to call qwen2.5 properly and have good output? Please help me! Thank you. I look forward to your reply.

1989Ryan · 2025-03-03T12:12:52Z

Hi there,

The initial decision for multiple samples of LLM responses is to have a rough approximation of the distribution over actions, as we do not have access to the log prob of GPT-4 at that time. If you have access to log probs, you might directly compute the distribution over potential actions.

However, please note that this project relies on the model's in-context learning. Even though Qwen is exceptional in many benchmarks, I am concerned about its in-context learning capacity, especially for smaller ones. Downgrades in performance are foreseeable if it's limited to in-context learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it have a strong dependence on multi response replies for llms? #6

Does it have a strong dependence on multi response replies for llms? #6

lollipopwsy commented Mar 3, 2025

1989Ryan commented Mar 3, 2025

Does it have a strong dependence on multi response replies for llms? #6

Does it have a strong dependence on multi response replies for llms? #6

Comments

lollipopwsy commented Mar 3, 2025

1989Ryan commented Mar 3, 2025