Imagine a scenario where you need to upgrade the OpenLineage connector with custom changes and there is a selected set of jobs that can be run multiple times to produce OpenLineage events.
In this case, you can execute each job using both the previous and the new version of the connector and gather the events through either FileTransport
or the logs generated by ConsoleTransport
.
Once the events are collected, you can utilize ol-diff
to compare their contents.
- Extracting lineage events from log:
ol-diff
works with log files whenConsoleTransport
is configured to log events. - Multiple runs per a single file:
ol-diff
can handle multiple runs in a single file with events, like a Spark job running several Spark actions. - Cumulative Comparison:
ol-diff
compares lineage events cumulatively. - Facet Verification: For each facet from the events, it triggers a separate test to check if the facets exist in the new version.
- Field Presence Check: It ensures that the fields from the previous version are present in the new version.
Simple example usage:
./ol-diff.sh
This will start Gradle docker container and run the tests.
- Job verification
- Verifies job name and namespace
- Verifies all the facets in previous version are present in the next version.
- For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
- Run verification
- Verifies all the facets in previous version are present in the next version.
- For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
- Dataset verification
- Verifies both connectors detect exact set of datasets.
- For each dataset, it verifies input facets, output facets and facets. For each facet in previous version, it verifies if the fields from previous version are present in the next version. New facets or fields are accepted.
- Application events verification
- Currently a flag is added to ignore application events.
- Otherwise application events are compared in the same way as jobs and runs, although behaviour was not tested.
TODO:
- Check application events - and their facets.
As a developer you can run tests with an extra param to include internal tests - tests that test the tool itself.
docker run --rm -u gradle -v "$PWD":/home/gradle/project -w /home/gradle/project gradle:jdk17-ubi gradle clean test -Pprev.path=examples/success/prev.txt -Pnext.path=examples/success/next.txt -Pinternal.tests=true
open build/reports/tests/test/index.html
Tests should then include extra classes like JobHelpersTest
or RunHelperTest
.