π»
π Machine Learning Engineer | NLP & LLM
π Economist | Empirical & Behavioral
π PhD | Decision Science & Managerial Economics
Pinned Loading
-
Logic-RL-Lite
Logic-RL-Lite PublicLightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT β Accuracy", and "Language Mixing in Instruct Models".
Python 68
-
DeepEnlighten
DeepEnlighten PublicPure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
Python 22
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.