-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
type:featureNew features and enhancementsNew features and enhancements
Description
Feature Description
Summary
This feature introduces table-level lineage metadata in Apache Hudi. Lineage records the direct upstream source tables from which a Hudi table is derived and stores this information as versioned table metadata.
Today, table lineage is often tracked externally or inferred heuristically, leading to inconsistency and loss of historical context. This proposal adds a simple, declarative, and deterministic lineage primitive directly to Hudi.
What is added
- A new table metadata property recording upstream source tables
- Lineage represented as a list of catalog.database.table identifiers
- Lineage versioned implicitly with table metadata evolution
Example:
hoodie.table.lineage.sources = [
"hive.rawdata.kafka_events",
"hive.rawdata.users"
]
Key design points
- Table-level only (no partition or column lineage)
- Previous-layer only (one hop)
- Declared explicitly by writers
- No inference or query engine dependency
User Experience
How users use this feature
- Opt-in: existing tables and pipelines are unchanged
- Writers declare lineage during table creation or initial ingestion
- Normal incremental writes do not modify lineage
Usage examples
Declare lineage when creating or rebuilding a table:
setLineageSources(Arrays.asList(
"hive.rawdata.kafka_events",
"hive.rawdata.users"
));
Read lineage:
metaClient.getTableConfig().getLineageSources();
What users do NOT need to do
- No schema changes
- No SQL or query changes
- No engine upgrades
- No new runtime dependencies
Hudi RFC Requirements
Non-Goals
- Column-level lineage
- Record-level lineage
- Automatic inference
- DAG management
- Query planner changes
Backward Compatibility
- Metadata is additive
- Existing tables unaffected
- No commit or file-format changes
Alternatives Considered
- Commit-level lineage (rejected)
- Engine-side inference (rejected)
- External-only lineage systems (rejected)
Future Work
- SQL / metadata table exposure
- Visualization tooling
- Integration with governance platforms
Metadata
Metadata
Assignees
Labels
type:featureNew features and enhancementsNew features and enhancements