🎯
Focusing
PhD in Reinforcement Learning, LLM Alignment, RLHF
-
University of Cambridge
- https://holarissun.github.io/
- @HolarisSun
Highlights
- Pro
Pinned Loading
-
Prompt-OIRL
Prompt-OIRL Publiccode for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
-
RewardModelingBeyondBradleyTerry
RewardModelingBeyondBradleyTerry Publicofficial implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
-
RewardShifting
RewardShifting PublicCode for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL
-
embedding-based-llm-alignment
embedding-based-llm-alignment PublicCodebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
-
Accountable-Offline-RL
Accountable-Offline-RL PublicCode for NeurIPS 2023 paper Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.