Skip to content
Change the repository type filter

All

    Repositories list

    • datahub-gma

      Public
      General Metadata Architecture
      Java
      601331320Updated Nov 5, 2025Nov 5, 2025
    • Liger-Kernel

      Public
      Efficient Triton Kernels for LLM Training
      Python
      4265.8k7430Updated Nov 5, 2025Nov 5, 2025
    • Hoptimator

      Public
      Multi-hop declarative data pipelines
      Java
      1412210Updated Nov 5, 2025Nov 5, 2025
    • venice

      Public
      Venice, Derived Data Platform for Planet-Scale Workloads.
      Java
      1065711720Updated Nov 4, 2025Nov 4, 2025
    • rest.li

      Public
      Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs.
      Java
      5562.5k5055Updated Nov 4, 2025Nov 4, 2025
    • ambry

      Public
      Distributed object store
      Java
      2831.8k13118Updated Nov 4, 2025Nov 4, 2025
    • iceberg

      Public
      A temporary home for LinkedIn's changes to Apache Iceberg (incubating)
      Java
      3363024Updated Nov 3, 2025Nov 3, 2025
    • brooklin

      Public
      An extensible distributed system for reliable nearline data streaming at scale
      Java
      1409471716Updated Nov 3, 2025Nov 3, 2025
    • gobblin-elr

      Public
      This is a read-only mirror of apache/gobblin
      Java
      4600Updated Nov 3, 2025Nov 3, 2025
    • helix

      Public
      Mirror of Apache Helix
      Java
      241108Updated Nov 3, 2025Nov 3, 2025
    • openhouse

      Public
      Open Control Plane for Tables in Data Lakehouse
      Java
      623701022Updated Nov 3, 2025Nov 3, 2025
    • ghc25-ds-workshop

      Public
      This repo is specifically for the Grace Hopper 2025 DS Workshop
      Jupyter Notebook
      1000Updated Nov 1, 2025Nov 1, 2025
    • Listing of all our public GitHub projects.
      JavaScript
      4764162Updated Nov 1, 2025Nov 1, 2025
    • transport

      Public
      A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
      Java
      743022411Updated Oct 30, 2025Oct 30, 2025
    • zookeeper

      Public
      Mirror of Apache Hadoop ZooKeeper
      Java
      7.3k639Updated Oct 30, 2025Oct 30, 2025
    • avro-util

      Public
      Collection of utilities to allow writing java code that operates across a wide range of avro versions.
      Java
      67855713Updated Oct 29, 2025Oct 29, 2025
    • cruise-control

      Public
      Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
      Java
      6372.9k21135Updated Oct 28, 2025Oct 28, 2025
    • Repo for talent-solutions-java-sdk project
      Java
      1100Updated Oct 27, 2025Oct 27, 2025
    • fmchisel

      Public
      fmchisel: Efficient Compression and Training Algorithms for Foundation Models
      Python
      87100Updated Oct 23, 2025Oct 23, 2025
    • goavro

      Public
      Goavro is a library that encodes and decodes Avro data.
      Go
      2301k6121Updated Oct 22, 2025Oct 22, 2025
    • coral

      Public
      Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
      Java
      2008685930Updated Oct 20, 2025Oct 20, 2025
    • Burrow

      Public
      Kafka Consumer Lag Checking
      Go
      8123.9k22418Updated Oct 3, 2025Oct 3, 2025
    • Shake to send feedback for Android.
      Java
      55161105Updated Sep 17, 2025Sep 17, 2025
    • diderot

      Public
      A fast and flexible implementation of the xDS protocol
      Go
      31800Updated Sep 17, 2025Sep 17, 2025
    • forthic

      Public
      Python
      72800Updated Sep 16, 2025Sep 16, 2025
    • DuaLip

      Public
      DuaLip: Dual Decomposition based Linear Program Solver
      Scala
      106510Updated Sep 8, 2025Sep 8, 2025
    • isolation-forest

      Public
      A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.
      Scala
      5124931Updated Aug 30, 2025Aug 30, 2025
    • test

      Public archive
      Apache Pinot - A realtime distributed OLAP datastore
      Java
      1.4k000Updated Aug 29, 2025Aug 29, 2025
    • robustInfer

      Public
      Repo for robustInfer
      Jupyter Notebook
      1000Updated Aug 28, 2025Aug 28, 2025
    • luminol

      Public
      Anomaly Detection and Correlation library
      Python
      2191.2k289Updated Aug 22, 2025Aug 22, 2025