Skip to content

Releases: edanalytics/edu_edfi_airflow

edu_edfi_airflow v0.4.3

31 Oct 20:20
65a2e98
Compare
Choose a tag to compare

What's Changed

  • Make dependency between EM-to-S3 and file-removal mandatory in Earthb… by @jayckaiser in #85
  • Feature/optimized bulk copy by @jayckaiser in #84

Full Changelog: v0.4.2...v0.4.3

edu_edfi_airflow v0.4.2

15 Oct 20:42
f3a8f64
Compare
Choose a tag to compare

New features

  • Add boolean pull_all_deletes argument to EdFiResourceDAG to re-pull all deletes for a resource when any are added (resolves deletes-skipping bug).
  • Allow SNOWFLAKE_TENANT_CODE to be overridden in earthmover_kwargs in EarthbeamDAG.

Under the hood

  • Simplify taskgroup declaration in EarthbeamDAG.

Fixes

  • Fix bug where singleton filepaths in EarthbeamDAG were not converted to lists upon initialization.
  • Add dependency between Lightbeam and file-deletion in EarthbeamDAG.

Full Changelog: v0.4.1...v0.4.2

edu_edfi_airflow v0.4.1

19 Sep 22:41
27126fc
Compare
Choose a tag to compare

Under the hood

  • Wrap Snowflake stage with single quotes to support filepaths with special characters

Fixes

  • Fix bugs where files written to S3 could be overwritten in EarthbeamDAG
  • Fix bug where optional files fail upload to S3

Full Changelog: v0.4.0...v0.4.1

edu_edfi_airflow v0.4.0

05 Sep 20:42
278a915
Compare
Choose a tag to compare

New features

  • Add EarthbeamDAG.partition_on_tenant_and_year(), a preprocessing function to shard data to parquet on disk. This is useful when a single input file contains multiple years and/or tenants.
  • Add EarthbeamDAG.build_dynamic_tenant_year_task_group() to build dynamic Earthbeam task groups for each file to process in a source folder
  • Add ID matching sub-taskgroup and arguments to EarthbeamDAG taskgroups, in order to retrieve an assessment's identity columns from Snowflake
  • Add optional postprocess Python callable to EarthbeamDAG taskgroups
  • Add optional Lightbeam validation to EarthbeamDAG taskgroups
  • Add option to log Python preprocess and postprocess outputs to Snowflake

Under the hood

  • Make accessing the Total-Count of the Ed-Fi /deletes endpoints optional using argument get_deletes_cv_with_deltas (necessary for generic Ed-Fi 5.3 ODSes)
  • Refactor EarthbeamDAG to use Airflow TaskFlow syntax and simplify Earthbeam task groups
  • Deprecate EarthbeamDAG.build_tenant_year_task_group() argument raw_dir

Full Changelog: v0.3.1...v0.4.0

edu_edfi_airflow v0.3.1

07 Aug 22:35
b188156
Compare
Choose a tag to compare

Fixes

  • Fix bug where updates to query-parameters persisted across every EdFiResourceDAG
  • Add logging of failed endpoints on EdFiResourceDAG task failed_total_counts

Full Changelog: v0.3.0...v0.3.1

edu_edfi_airflow v0.3.0

11 Jun 19:44
e3a39dc
Compare
Choose a tag to compare

New features

  • Add /keyChanges ingestion for resource endpoints
  • Add new method for EdFiResourceDAG endpoint instantiation using resource_configs and descriptor_configs arguments in init
    • The prior methods EdFiResourceDAG.{add_resource, add_descriptor, add_resource_deletes} are deprecated in favor of this more performant approach.
  • Refactor EdFiToS3Operator taskgroup into three options (determined by run_type argument):
    • "default": One EdFiToS3Operator task per resource/deletes/keyChanges endpoint
    • "bulk": One BulkEdFiToS3Operator task in which all endpoints are looped over in one callable
    • "dynamic": One dynamically-mapped EdFiToS3Operator task per resource with deltas to ingest

Under the hood

  • Copies from S3 to Snowflake in EdFiResourceDAG are now completed in a single bulk task (instead of one per endpoint)
  • EdFiResourceDAG and EarthbeamDAG now inherit from ea_airflow_util DAG factory EACustomDAG
  • Streamline XCom passing between tasks in EdFiResourceDAG
  • Change-version window delta counts are made when checking change versions in Snowflake.
    • Only resources with rows-to-ingest are passed to the Ed-Fi operator.

Full Changelog: v0.2.5...v0.3.0

edu_edfi_airflow v0.2.5

12 Apr 15:18
816e043
Compare
Choose a tag to compare

What's Changed

  • Add optional argument schedule_interval_full_refresh to specify a CRON syntax for full-refresh Ed-Fi DAG runs by @jayckaiser in #29
  • Update Earthbeam DAG logging copy statement to prevent character-escaping issues during copy by @jayckaiser in #31

Full Changelog: v0.2.4...v0.2.5

edu_edfi_airflow v0.2.4

09 Feb 19:12
049f93a
Compare
Choose a tag to compare

What's Changed

    • Add alternative arguments for setting s3_destination_key in S3ToSnowflakeOperator: s3_destination_dir and s3_destination_filename by @rlittle08 in #30

Full Changelog: v0.2.3...v0.2.4

edu_edfi_airflow v0.2.3

21 Dec 23:23
Compare
Choose a tag to compare

What's Changed

  • Move min_change_version fix from init to execute. by @jayckaiser in #28

Full Changelog: v0.2.2...v0.2.3

edu_edfi_airflow v0.2.2

01 Dec 18:50
8c33589
Compare
Choose a tag to compare

What's Changed

  • Refactor task-group ordering to branch EM/LB logging outside of main task group in EarthbeamDAG by @jayckaiser in #21
  • Add optional pool argument when initializing an Ed-Fi task group that overrides default DAG pool by @jayckaiser in #24

Full Changelog: v0.2.1...v0.2.2