- Add boolean
pull_all_deletes
argument toEdFiResourceDAG
to re-pull all deletes for a resource when any are added (resolves deletes-skipping bug). - Allow
SNOWFLAKE_TENANT_CODE
to be overridden inearthmover_kwargs
inEarthbeamDAG
.
- Simplify taskgroup declaration in
EarthbeamDAG
.
- Fix bug where singleton filepaths in
EarthbeamDAG
were not converted to lists upon initialization. - Add dependency between Lightbeam and file-deletion in
EarthbeamDAG
.
- Wrap Snowflake stage with single quotes to support filepaths with special characters
- Fix bugs where files written to S3 could be overwritten in
EarthbeamDAG
- Fix bug where optional files fail upload to S3
- Add
EarthbeamDAG.partition_on_tenant_and_year()
, a preprocessing function to shard data to parquet on disk. This is useful when a single input file contains multiple years and/or tenants. - Add
EarthbeamDAG.build_dynamic_tenant_year_task_group()
to build dynamic Earthbeam task groups for each file to process in a source folder - Add ID matching sub-taskgroup and arguments to
EarthbeamDAG
taskgroups, in order to retrieve an assessment's identity columns from Snowflake - Add optional postprocess Python callable to
EarthbeamDAG
taskgroups - Add optional Lightbeam validation to
EarthbeamDAG
taskgroups - Add option to log Python preprocess and postprocess outputs to Snowflake
- Make accessing the
Total-Count
of the Ed-Fi/deletes
endpoints optional using argumentget_deletes_cv_with_deltas
(necessary for generic Ed-Fi 5.3 ODSes) - Refactor
EarthbeamDAG
to use Airflow TaskFlow syntax and simplify Earthbeam task groups - Deprecate
EarthbeamDAG.build_tenant_year_task_group()
argumentraw_dir
- Fix bug where updates to query-parameters persisted across every
EdFiResourceDAG
- Add logging of failed endpoints on
EdFiResourceDAG
taskfailed_total_counts
- Add
/keyChanges
ingestion for resource endpoints - Add new method for
EdFiResourceDAG
endpoint instantiation usingresource_configs
anddescriptor_configs
arguments in init- The prior methods
EdFiResourceDAG.{add_resource, add_descriptor, add_resource_deletes}
are deprecated in favor of this more performant approach.
- The prior methods
- Refactor
EdFiToS3Operator
taskgroup into three options (determined byrun_type
argument):- "default": One
EdFiToS3Operator
task per resource/deletes/keyChanges endpoint - "bulk": One
BulkEdFiToS3Operator
task in which all endpoints are looped over in one callable - "dynamic": One dynamically-mapped
EdFiToS3Operator
task per resource with deltas to ingest
- "default": One
- Copies from S3 to Snowflake in
EdFiResourceDAG
are now completed in a single bulk task (instead of one per endpoint) EdFiResourceDAG
andEarthbeamDAG
now inherit fromea_airflow_util
DAG factoryEACustomDAG
- Streamline XCom passing between tasks in
EdFiResourceDAG
- Change-version window delta counts are made when checking change versions in Snowflake.
- Only resources with rows-to-ingest are passed to the Ed-Fi operator.
- Add optional argument
schedule_interval_full_refresh
to specify a CRON syntax for full-refresh Ed-Fi DAG runs.
- Update Earthbeam DAG logging copy statement to prevent character-escaping issues during copy.
- Add alternative arguments for setting
s3_destination_key
inS3ToSnowflakeOperator
:s3_destination_dir
ands3_destination_filename
.
- Fix parsing error in full-refresh runs
- Add optional argument
pool
toEdFiResourceDAG.build_edfi_to_snowflake_task_group()
to override DAG-level pool when ingesting high-impact resources.
- Fix task-ordering in
EarthbeamDAG
to make the success of Snowflake-logging independent of downstream tasks.
- Add optional argument
database_conn_id
toEarthmoverOperator
andEarthbeamDAG
to pass database connection information without exposing it in Airflow Task Instance details.
- Remove
provide_context
from default kwarg arguments inEarthbeamDAG.build_bash_preprocessing_operator()
.
- Add
EarthbeamDAG
for sending raw data into Ed-Fi ODS or Stadium data warehouse directly - Add
EarthmoverOperator
andLightbeamOperator
- Add optional operator to
EdFiResourceDAG
to increment Airflow variable at run-completion (for selective triggering of downstream DBT DAG)
- Refactor
EdFiResourceDAG
to bundle tasks that interface with the meta-change-version Snowflake table - Refactor
EdFiResourceDAG
to streamline declaration and bundling of Ed-Fi endpoint task-groups - Use Airflow Params for defining DAG-level configs
- Extend
S3ToSnowflakeOperator
to manually specify Ed-Fi-specific metadata in SQL copy-statement
- Fix bug where DAG-kwargs are not passed to
EdFiResourceDAG
init - Fix bug where running a full-refresh on a subset of endpoints reset all endpoints in the meta-change-version Snowflake table