Sea AI Lab

All

98 repositories

oat
Public
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl
Python
•
Apache License 2.0
•50•584•4•2•Updated Dec 23, 2025Dec 23, 2025
jrystal
Public
A JAX-based Differentiable Density Functional Theory Framework for Materials
density-functional-theory quantum-chemistry material-science solid-state jax differentiable-programming electron-structure
Python
•
Apache License 2.0
•2•42•5•0•Updated Dec 5, 2025Dec 5, 2025
d4ft
Public
A JAX library for Density Functional Theory.
Python
•
Apache License 2.0
•5•54•16•0•Updated Nov 25, 2025Nov 25, 2025
Precision-RL
Public
Defeating the Training-Inference Mismatch via FP16
Python
•
MIT License
•14•165•3•0•Updated Nov 14, 2025Nov 14, 2025
Precision-RL-verl
Public
Defeating the Training-Inference Mismatch via FP16
Python
•
Apache License 2.0
•2.9k•4•0•0•Updated Nov 14, 2025Nov 14, 2025
NDA
Public
Code for "Nonparametric Data Attribution for Diffusion Models"
Jupyter Notebook
•0•3•1•0•Updated Nov 11, 2025Nov 11, 2025
SkyLadder
Public
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Python
•
Apache License 2.0
•582•40•1•0•Updated Oct 15, 2025Oct 15, 2025
tty-use
Public
C
•0•12•0•0•Updated Oct 13, 2025Oct 13, 2025
imperceptible-jailbreaks
Public
[ArXiv 2025] Imperceptible Jailbreaking against Large Language Models
Python
•5•22•0•0•Updated Oct 7, 2025Oct 7, 2025
feedback-conditional-policy
Public
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
Python
•1•55•0•0•Updated Sep 29, 2025Sep 29, 2025
variational-reasoning
Public
Code for "Variational Reasoning for Language Models"
Python
•1•53•1•0•Updated Sep 29, 2025Sep 29, 2025
LifelongSafetyAlignment
Public
Python
•0•10•1•0•Updated Sep 20, 2025Sep 20, 2025
autofd
Public
Automatic Functional Differentiation in JAX
automatic-differentiation jax neural-operator variational-calculus
Python
•
Apache License 2.0
•1•80•6•0•Updated Sep 18, 2025Sep 18, 2025
BanditSpec
Public
Python
•2•5•0•0•Updated Sep 2, 2025Sep 2, 2025
understand-r1-zero
Public
Understanding R1-Zero-Like Training: A Critical Perspective
rl reasoning llm r1-zero
Python
•
MIT License
•53•1.2k•9•0•Updated Aug 27, 2025Aug 27, 2025
Video-Next-Event-Prediction
Public
Python
•
MIT License
•1•19•3•0•Updated Aug 9, 2025Aug 9, 2025
AnytimeReasoner
Public
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Python
•
Apache License 2.0
•3•50•0•0•Updated Jul 15, 2025Jul 15, 2025
LongSpec
Public
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
Python
•
MIT License
•3•69•0•0•Updated Jul 14, 2025Jul 14, 2025
Attention-Sink
Public
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
language-model attention-mechanism large-language-models attention-sink
Python
•
MIT License
•5•150•1•0•Updated Jul 8, 2025Jul 8, 2025
VeriFree
Public
Reinforcing General Reasoning without Verifiers
Python
•6•93•7•0•Updated Jun 24, 2025Jun 24, 2025
Adan
Public
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
deep-learning optimizer pytorch artificial-intelligence moe resnet vit diffusion mae fairseq
Python
•
Apache License 2.0
•70•805•4•0•Updated Jun 8, 2025Jun 8, 2025
TreeMeshGPT
Public
[CVPR 2025] TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Python
•
MIT License
•11•174•3•0•Updated May 22, 2025May 22, 2025
ContinualBench
Public
Python
•
MIT License
•1•16•0•0•Updated May 20, 2025May 20, 2025
zero-bubble-pipeline-parallelism
Public
Zero Bubble Pipeline Parallelism
Python
•
Other
•3.4k•442•29•0•Updated May 7, 2025May 7, 2025
FlowReasoner
Public
Python
•7•142•1•0•Updated May 6, 2025May 6, 2025
Meta-Unlearning
Public
Python
•2•33•1•0•Updated Apr 22, 2025Apr 22, 2025
LightTrans
Public
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
Python
•0•22•0•0•Updated Apr 22, 2025Apr 22, 2025
ActivePRM
Public
Jupyter Notebook
•0•19•0•0•Updated Apr 16, 2025Apr 16, 2025
dice
Public
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
alignment preference-learning large-language-models rlhf
Python
•
MIT License
•3•46•0•0•Updated Apr 15, 2025Apr 15, 2025
oat-zero
Public
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Python
•
MIT License
•10•249•0•0•Updated Apr 15, 2025Apr 15, 2025