Skip to content

Releases: GoogleCloudPlatform/cloud-data-quality

v0.5.2

04 Feb 22:40
09b313e
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.1...v0.5.2

v0.5.1

26 Jan 17:50
a8161e0
Compare
Choose a tag to compare

What's Changed

  • Validation error of CUSTOM_SQL_STATEMENT without custom parameter by @pbalm in #121
  • allow parametrization of CUSTOM_SQL_EXPR rules by @thinhha in #124
  • update user-agent for attribution by @thinhha in #125
  • Fixing an issue with conflicting names: column_id=data with the CTE data alias by @hejnal in #127
  • Setting up documentation structure by @pbalm in #120
  • bq client supports different regions by @thinhha in #129
  • update to v0.5.1 by @thinhha in #130
  • Small fix to README and linting by @pbalm in #131
  • add throttling for dataplex client by @thinhha in #132

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0

16 Dec 20:17
37e0c19
Compare
Choose a tag to compare

What's Changed

This release fixes the following bugs and includes breaking changes to the way CloudDQ reports summary statistics to the summary BigQuery table.:

  1. Incremental rule-bindings with 'incremental_time_filter_column_id' will fail if executed for the first time on a BQ dataset and the dq_summary table has not created yet. In the new behaviour, CloudDQ checks if dq_summary table exists and only runs the high-watermark query if the dq_summary table exists, otherwise it executes a full-table scan as it would normally if 'incremental_time_filter_column_id' is not set.
  2. Rule-bindings with 'incremental_time_filter_column_id' returns 0 records to dq_summary if no new data has arrived since the last high-watermark. In the new behaviour, CloudDQ returns 1 record with 0 rows_validated and all other statistics set to NULL.
  3. CUSTOM_SQL_STATEMENT returns 0 records to dq_summary if the custom-sql returns 0 records (i.e. everything succeeds. In the new behaviour, CloudDQ returns 1 record with 0 rows_validated and all other statistics set to NULL.
  4. NOT_NULL rules returning the same count across failed_count and null_count, causing the sum of success_count + failed_count + null_count to exceed the rows_validated. In the new behaviour, if the rule is NOT_NULL, null_count is set to NULL.

Going forward:

  • success_count + failed_count + null_count should always be equal to rows_validated
  • CloudDQ always return 1 record for each rule-binding/rule into dq_summary
  • null_count will be NULL for NOT_NULL rule type
  • CUSTOM_SQL_STATEMENT rules will have NULL values in success_count, failed_count, and null_count. And will only populate the existing columns 'complex_rule_validation_errors_count' with the number of rows returned by the custom_sql and additionally a new column 'complex_rule_validation_success_flag' which is TRUE if 'complex_rule_validation_errors_count' is 0, FALSE if 'complex_rule_validation_errors_count' is greater than 0, and NULL if the rule_type if not CUSTOM_SQL_STATEMENT.
  • CUSTOM_SQL_STATEMENT is no longer recommended for record-level validation. Users are recommended to use CUSTOM_SQL_EXPR to implement custom record-level this requirements.

This constitutes a breaking change to the dq_summary results & we hope that the change will reduce the confusions in the way different summary statistics are calculated.

The change is illustrated in the following example:
Given the YAML provided in this example on the sample contact_details data, CloudDQ will now generate the following results

Full Changelog: v0.4.1...v0.5.0

v0.4.1

07 Dec 23:13
b327690
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.4.0...v0.4.1

v0.4.1-rc1

03 Dec 18:00
9988b3d
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.4.0...v0.4.1-rc1

v0.4.0

01 Dec 13:50
d51857a
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.4.0

v0.4.0-rc2

26 Nov 15:23
5bc2608
Compare
Choose a tag to compare

Patch release to avoid duplication of log records.

v0.4.0-rc1

25 Nov 15:31
5a1eac2
Compare
Choose a tag to compare

What's Changed

  • Log dq_summary table to stdout by @pbalm in #92

Full Changelog: v0.3.1...v0.4.0-rc1

v0.3.2-rc2

22 Nov 09:41
Compare
Choose a tag to compare

What's Changed

  • Addition of a column last_modified to the dq_summary table that indicates the last modification date of the data being checked.

Full Changelog: v0.3.1...v0.3.2-rc2

v0.3.2-rc1

12 Nov 14:56
2942476
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2-rc1