Skip to content

simmonssong/efficient-agentic-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Efficient Agentic LLM

Value Function Estimation

  1. Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference ICLR 2025. Paper

    Qining Zhang, Lei Ying

    Moti: Reward function construction bottleneck: RLHF -> DPO -> GRPO.

    Design: Directly apply policy-gradient through ZO-based value function estimation.

About

Efficient Agentic LLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published