Skip to content

Commit

Permalink
Update metadata files for version release; minor cleanup in Earthbeam…
Browse files Browse the repository at this point in the history
…DAG. (#74)
  • Loading branch information
jayckaiser authored Sep 19, 2024
1 parent 71550d7 commit 27126fc
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 8 deletions.
13 changes: 12 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
## edu_edfi_airflow v0.4.0
# edu_edfi_airflow v0.4.1
## Under the hood
- Wrap Snowflake stage with single quotes to support filepaths with special characters

## Fixes
- Fix bugs where files written to S3 could be overwritten in `EarthbeamDAG`
- Fix bug where optional files fail upload to S3



# edu_edfi_airflow v0.4.0
## New features
- Add `EarthbeamDAG.partition_on_tenant_and_year()`, a preprocessing function to shard data to parquet on disk. This is useful when a single input file contains multiple years and/or tenants.
- Add `EarthbeamDAG.build_dynamic_tenant_year_task_group()` to build dynamic Earthbeam task groups for each file to process in a source folder
Expand All @@ -12,6 +22,7 @@
- Refactor `EarthbeamDAG` to use Airflow TaskFlow syntax and simplify Earthbeam task groups
- Deprecate `EarthbeamDAG.build_tenant_year_task_group()` argument `raw_dir`


# edu_edfi_airflow v0.3.1
## Fixes
- Fix bug where updates to query-parameters persisted across every `EdFiResourceDAG`
Expand Down
9 changes: 3 additions & 6 deletions edu_edfi_airflow/dags/earthbeam_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -739,11 +739,11 @@ def upload_to_s3(filepaths: Union[str, List[str]], subdirectory: str, s3_file_su
raise ValueError(
"Argument `s3_filepath` must be defined to upload transformed Earthmover files to S3."
)


filepaths = [filepaths] if isinstance(filepaths, str) else filepaths
file_basename = self.get_filename(filepaths[0])
filepaths = [filepaths] if isinstance(filepaths, str) else filepaths # Data-dir is passed as a singleton
s3_file_subdirs = [None] * len(filepaths) if not s3_file_subdirs else s3_file_subdirs

file_basename = self.get_filename(filepaths[0])
s3_full_filepath = edfi_api_client.url_join(
s3_filepath, subdirectory,
tenant_code, self.run_type, api_year, grain_update,
Expand All @@ -752,9 +752,6 @@ def upload_to_s3(filepaths: Union[str, List[str]], subdirectory: str, s3_file_su
)
s3_full_filepath = context['task'].render_template(s3_full_filepath, context)

filepaths = [filepaths] if isinstance(filepaths, str) else filepaths # Data-dir is passed as a singleton
s3_file_subdirs = [None] * len(filepaths) if not s3_file_subdirs else s3_file_subdirs

# Zip optional subdirectories if specified; make secondary file-uploads optional
for idx, (filepath, file_subdir) in enumerate(zip(filepaths, s3_file_subdirs)):
filepath = context['task'].render_template(filepath, context)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

setuptools.setup(
name='edu_edfi_airflow',
version='0.4.0',
version='0.4.1',

description='EDU Airflow tools for Ed-Fi',
license_files=['LICENSE.md'],
Expand Down

0 comments on commit 27126fc

Please sign in to comment.