Skip to content
Change the repository type filter

All

    Repositories list

    • data-juicer

      Public
      Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
      Python
      Apache License 2.0
      3346k3227Updated Mar 6, 2026Mar 6, 2026
    • 🤖 Your Intelligent Copilot for Data Exploration and Processing Pipeline
      Python
      Apache License 2.0
      5903Updated Feb 26, 2026Feb 26, 2026
    • Community-driven data-juicer recipes and best practices for various pre-training/fine-tuning tasks.
      Apache License 2.0
      3600Updated Feb 12, 2026Feb 12, 2026
    • data-juicer-sphinx

      Public
      Apache License 2.0
      1000Updated Feb 12, 2026Feb 12, 2026
    • data-juicer-sandbox

      Public
      A Feedback-Driven Suite for Multimodal Data-Model Co-development.
      Python
      Apache License 2.0
      3400Updated Jan 15, 2026Jan 15, 2026
    • recognize-anything

      Public
      Open-source and strong foundation image recognition models. Self-modified version.
      Jupyter Notebook
      Apache License 2.0
      322000Updated Jan 12, 2026Jan 12, 2026
    • transformers-stream-generator

      Public
      This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers. Self-modi…
      Python
      MIT License
      19000Updated Nov 15, 2025Nov 15, 2025
    • .github

      Public
      0000Updated Nov 5, 2025Nov 5, 2025