Skip to content

OpenLineage/ol-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ol-diff

Use Case

Imagine a scenario where you need to upgrade the OpenLineage connector with custom changes and there is a selected set of jobs that can be run multiple times to produce OpenLineage events.

In this case, you can execute each job using both the previous and the new version of the connector and gather the events through either FileTransport or the logs generated by ConsoleTransport.

Once the events are collected, you can utilize ol-diff to compare their contents.

Features of ol-diff

  • Extracting lineage events from log: ol-diff works with log files when ConsoleTransport is configured to log events.
  • Multiple runs per a single file: ol-diff can handle multiple runs in a single file with events, like a Spark job running several Spark actions.
  • Cumulative Comparison: ol-diff compares lineage events cumulatively.
  • Facet Verification: For each facet from the events, it triggers a separate test to check if the facets exist in the new version.
  • Field Presence Check: It ensures that the fields from the previous version are present in the new version.

Simple example usage:

./ol-diff.sh

This will start Gradle docker container and run the tests.

Verification

  • Job verification
    • Verifies job name and namespace
    • Verifies all the facets in previous version are present in the next version.
    • For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
  • Run verification
    • Verifies all the facets in previous version are present in the next version.
    • For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
  • Dataset verification
    • Verifies both connectors detect exact set of datasets.
    • For each dataset, it verifies input facets, output facets and facets. For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
  • Application events verification
    • Currently a flag is added to ignore application events.
    • Otherwise application events are compared in the same way as jobs and runs, although behaviour was not tested.

Future work

TODO:

  • Check application events - and their facets.

Development

As a developer you can run tests with an extra param to include internal tests - tests that test the tool itself.

docker run --rm -u gradle -v "$PWD":/home/gradle/project -w /home/gradle/project gradle:jdk17-ubi  gradle clean test -Pprev.path=examples/success/prev.txt -Pnext.path=examples/success/next.txt -Pinternal.tests=true
open build/reports/tests/test/index.html

Tests should then include extra classes like JobHelpersTest or RunHelperTest.

About

Tool to compare OpenLineage events for the same job run and different producers versions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published