You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the release of DeepSeek's R1 model, GRPO has been shown to be a powerful way to instill reasoning capabilities in models for cases where there is either labeled data or a verifier. This request is to add support to train a model with GRPO, perhaps with a focus on building reasoning abilities.
The text was updated successfully, but these errors were encountered:
Heyaaaaa!
I would like to take this. I've contributed to llmstudio before so am slightly familiar with the code base (#683 ). Was a bit occupied with life lately but I'm ready to start contributing again to h2o and other open source projects and I think this could be a good point to get back into the open source landscape.
I've read a bit about GRPO and DeepSeek but might need some support to pull this through though : )
Maybe some reading materials or sample code implementations might be great to begin with.
🚀 Feature
Add GRPO Support
Motivation
With the release of DeepSeek's R1 model, GRPO has been shown to be a powerful way to instill reasoning capabilities in models for cases where there is either labeled data or a verifier. This request is to add support to train a model with GRPO, perhaps with a focus on building reasoning abilities.
The text was updated successfully, but these errors were encountered: