Skip to content

Commit 351a6db

Browse files
authored
Merge pull request #148 from edanalytics/feature/docs_update
New year, new docs!
2 parents 0fadd05 + 7a5523c commit 351a6db

20 files changed

+2121
-1600
lines changed

.github/workflows/ci.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: ci
2+
on:
3+
push:
4+
branches:
5+
- main
6+
permissions:
7+
contents: write
8+
jobs:
9+
deploy:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
- name: Configure Git Credentials
14+
run: |
15+
git config user.name github-actions[bot]
16+
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
17+
- uses: actions/setup-python@v5
18+
with:
19+
python-version: 3.x
20+
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
21+
- uses: actions/cache@v4
22+
with:
23+
key: mkdocs-material-${{ env.cache_id }}
24+
path: .cache
25+
restore-keys: |
26+
mkdocs-material-
27+
- run: pip install mkdocs-material
28+
- run: mkdocs gh-deploy --force

CHANGELOG.md

Lines changed: 1 addition & 345 deletions
Original file line numberDiff line numberDiff line change
@@ -1,345 +1 @@
1-
### v0.4.4
2-
<details>
3-
<summary>Released 2025-03-06</summary>
4-
5-
* bugfix: [improve exception-handling when loading a `SQLSource` using SQLAlchemy 2.x](https://github.com/edanalytics/earthmover/pull/153)
6-
7-
</details>
8-
9-
10-
### v0.4.3
11-
<details>
12-
<summary>Released 2025-01-23</summary>
13-
14-
* feature: [allow a `colspec_file` config with column info for `fixedwidth` inputs](https://github.com/edanalytics/earthmover/pull/139)
15-
* feature: [error messages for `keep_columns` and `drop_columns` now specify the columns](https://github.com/edanalytics/earthmover/pull/150)
16-
17-
</details>
18-
19-
20-
### v0.4.2
21-
<details>
22-
<summary>Released 2024-11-15</summary>
23-
24-
* feature: interpolate params into destination templates by @tomreitz in https://github.com/edanalytics/earthmover/pull/141
25-
* feature: lowercase columns by @jayckaiser in https://github.com/edanalytics/earthmover/pull/143
26-
* fix: optional fields recursion by @rlittle08 in https://github.com/edanalytics/earthmover/pull/142
27-
* fix: `earthmover deps` fails when not all params are passed by @johncmerfeld in https://github.com/edanalytics/earthmover/pull/140
28-
* fix: make all pandas/dask config conditional on >3.10 by @johncmerfeld in https://github.com/edanalytics/earthmover/pull/146
29-
30-
</details>
31-
32-
33-
### v0.4.1
34-
<details>
35-
<summary>Released 2024-11-15</summary>
36-
37-
* feature: [allow specifying `colspecs` for fixed-width files](https://github.com/edanalytics/earthmover/pull/133)
38-
* feature: [allow `config` params to be passable at the CLI and have a `parameter_default`](https://github.com/edanalytics/earthmover/pull/130)
39-
* feature: [refactor Source columns list logic as a select instead of a rename](https://github.com/edanalytics/earthmover/pull/137)
40-
* bugfix: []`earthmover deps` failed to find nested local packages](https://github.com/edanalytics/earthmover/pull/134)
41-
* bugfix: [relative paths not resolved correct when using project composition](https://github.com/edanalytics/earthmover/pull/134)
42-
* bugfix: `--results-file` required a directory prefix
43-
* bugfix: [some functionality was broken for Python versions < 3.10](https://github.com/edanalytics/earthmover/pull/136)
44-
45-
</details>
46-
47-
48-
### v0.4.0
49-
<details>
50-
<summary>Released 2024-10-16</summary>
51-
52-
* feature: add support for Python 3.12, with corresponding updates to core dataframe dependencies
53-
* feature: add `--set` flag for overriding values within `earthmover.yml` from the command line
54-
55-
</details>
56-
57-
58-
### v0.3.8
59-
<details>
60-
<summary>Released 2024-09-06</summary>
61-
62-
* bugfix: Jinja in destination `header` failed if dataframe is empty
63-
64-
</details>
65-
66-
67-
### v0.3.7
68-
<details>
69-
<summary>Released 2024-09-04</summary>
70-
71-
* feature: implementing a limit_rows operation
72-
* feature: add support for a `require_rows` boolean or non-negative int on any node
73-
* feature: add support for Jinja in a destination node header and footer
74-
* bugfix: union fails with duplicate columns
75-
76-
</details>
77-
78-
79-
### v0.3.6
80-
<details>
81-
<summary>Released 2024-08-07</summary>
82-
83-
* feature: add `json_array_agg` function to `group_by` operation
84-
* feature: select all columns using "*" in `modify_columns` operation
85-
* internal: set working directory to the location of the `earthmover.yaml` file
86-
* documentation: add information on `earthmover init` and `earthmover clean` to the README
87-
* bugfix: fix bug with `earthmover clean` that could have removed earthmover.yaml files
88-
89-
</details>
90-
91-
92-
### v0.3.5
93-
<details>
94-
<summary>Released 2024-07-12</summary>
95-
96-
* feature: add `earthmover init` command to initialize a new sample project in the expected bundle structure
97-
* internal: expand test run to include the new `debug` and `flatten` operations, as well as a nested JSON source file
98-
* internal: improve customization in write behavior in new file destinations
99-
* bugfix: Fix bug when writing null values in `FileDestination`
100-
101-
</details>
102-
103-
104-
### v0.3.4
105-
<details>
106-
<summary>Released 2024-06-26</summary>
107-
108-
* hotfix: Fix bug when writing out JSON in `FileDestination`
109-
110-
</details>
111-
112-
113-
### v0.3.3
114-
<details>
115-
<summary>Released 2024-06-18</summary>
116-
117-
* hotfix: Resolve incompatible package dependencies
118-
* hotfix: Fix type casting of nested JSON for destination templates
119-
120-
</details>
121-
122-
### v0.3.2
123-
<details>
124-
125-
<summary>Released 2024-06-14</summary>
126-
127-
* feature: Add `DebugOperation` for logging data head, tail, columns, or metadata midrun
128-
* feature: Add `FlattenOperation` for splitting and exploding string columns into values
129-
* feature: Add optional 'fill_missing_columns' field to `UnionOperation` to fill disjunct columns with nulls, instead of raising an error (default `False`)
130-
* feature: Add `git_auth_timeout` config when entering Git credentials during package composition
131-
* feature: [Add `earthmover clean` command that removes local project artifacts](https://github.com/edanalytics/earthmover/pull/87)
132-
* feature: only output compiled template during `earthmover compile`
133-
* feature: Render full row into JSON lines when `template` is undefined in `FileDestination`
134-
* internal: Move `FileSource` size-checking and `FtpSource` FTP-connecting from compile to execute
135-
* internal: Move template-file check from compile to execute in `FileDestination`
136-
* internal: Allow filepaths to be passed to an optional `FileSource`, and check for file before creating empty dataframe
137-
* internal: Build an empty dataframe if an empty folder is passed to an optional `FileSource`
138-
* internal: fix some examples in README
139-
* internal: remove GitPython dependency
140-
* bugfix: fix bug in `FileDestination` where `linearize: False` resulted in BOM characters
141-
* bugfix: fix bug where nested JSON would be loaded as a stringified Python dictionary
142-
* bugfix: [Ensure command list in help menu and log output is always consistent](https://github.com/edanalytics/earthmover/pull/87)
143-
* bugfix: fix bug in `ModifyColumnsOperation` where `__row_data__` was not exposed in Jinja templating
144-
145-
</details>
146-
147-
148-
### v0.3.1
149-
<details>
150-
151-
<summary>Released 2024-04-26</summary>
152-
153-
* internal: allow any ordering of Transformations during graph-building in compile
154-
* internal: only create a `/packages` dir when `earthmover deps` succeeds
155-
156-
</details>
157-
158-
159-
### v0.3.0
160-
<details>
161-
162-
<summary>Released 2024-04-17</summary>
163-
164-
* feature: add project composition using `packages` keyword in template file (see README)
165-
* feature: add installation extras for optional libraries, and improve error logging to notify which is missing
166-
* feature: `GroupByWithRankOperation` cumulatively sums record counts by group-by columns
167-
* feature: setting `log_level: DEBUG` in template configs or setting `debug: True` for a node displays the head of the node mid-run
168-
* feature: add `optional_fields` key to all Sources to add optional empty columns when missing from schema
169-
* feature: add optional `ignore_errors` and `exact_match` boolean flags to `DateFormatOperation`
170-
* internal: force-cast a dataframe to string-type before writing as a Destination
171-
* internal: remove attempted directory-hashing when a source is a directory (i.e., Parquet)
172-
* internal: refactor project to standardize import paths for Node and Operation
173-
* internal: add `Node.full_name` attribute and `Node.set_upstream_source()` method
174-
* internal: unify graph-building into compilation
175-
* internal: refactor compilation and execution code for cleanliness
176-
* internal: unify `Node.compile()` into initialization to ease Node development
177-
* internal: Remove unused `group_by_with_count` and `group_by_with_agg` operations
178-
179-
</details>
180-
181-
182-
### v0.2.1
183-
<details>
184-
<summary>Released 2024-04-08</summary>
185-
186-
* feature: [adding fromjson() function to Jinja](https://github.com/edanalytics/earthmover/pull/75)
187-
* feature: [fix docs typos](https://github.com/edanalytics/earthmover/pull/68)
188-
* feature: [`SortRowsOperation` sorts the dataset by `columns`](https://github.com/edanalytics/earthmover/pull/56)
189-
190-
</details>
191-
192-
### v0.2.0
193-
<details>
194-
<summary>Released 2023-09-11</summary>
195-
196-
* breaking change: remove `source` as Operation config and move to Transformation; this simplifies templates and reduces memory usage
197-
* breaking change: `version: 2` required in Earthmover YAML files
198-
* feature: `SnakeCaseColumnsOperation` converts all columns to snake_case
199-
* feature: `show_progress` can be turned on globally in `config` or locally in any Source, Transformation, or Destination to display a progress bar
200-
* feature: `repartition` can be turned on in any applicable `Node` to alter Dask partition-sizes post-execute
201-
* feature: improve performance when writing Destination files
202-
* feature: improved Earthmover YAML-parsing and config-retrieval
203-
* internal: rename `YamlEnvironmentJinjaLoader` to `JinjaEnvironmentYamlLoader` for better transparency of use
204-
* internal: simplify Earthmover.build_graph()
205-
* internal: unify Jinja rendering into a single util function, instead of redeclaring across project
206-
* internal: unify `Node.verify()` into `Node.execute()` for improved code legibility
207-
* internal: improve attribute declarations across project
208-
* internal: improve type-hinting and doc-strings across project
209-
* bugfix: refactor SqlSource to be compatible with SQLAlchemy 2.x
210-
211-
</details>
212-
213-
### v0.1.6
214-
<details>
215-
<summary>Released 2023-07-11</summary>
216-
217-
* bugfix: [fixing a bug to create the results_file directory if needed](https://github.com/edanalytics/earthmover/pull/40)
218-
* bugfix: [process a copy of each nodes data at each step, to avoid modifying original node data which downstreams nodes may rely on](https://github.com/edanalytics/earthmover/pull/41)
219-
220-
</details>
221-
222-
### v0.1.5
223-
<details>
224-
<summary>Released 2023-06-13</summary>
225-
226-
* bugfix: [fixing a bug to skip hashing missing optional source files](https://github.com/edanalytics/earthmover/pull/34)
227-
* feature: [adding a tmp_dir config so we can tell Dask where to store data it spills to disk](https://github.com/edanalytics/earthmover/pull/37)
228-
* feature: [adding a `--results-file` option to produce structured run metadata](https://github.com/edanalytics/earthmover/pull/35)
229-
* feature: [adding a skip exit code](https://github.com/edanalytics/earthmover/pull/36)
230-
231-
</details>
232-
233-
### v0.1.4
234-
<details>
235-
<summary>Released 2023-05-12</summary>
236-
237-
* bugfix: `config.state`_file was being ignored when specified
238-
* bugfix: further issues with multi-line `config.macros` - the resolution here (hopefully the last one!) is to pre-load macros (so they can be injected into run-time Jinja contexts) and then just allow the Jinja to render and macro definitions down to nothing in the config YAML... you do have to be careful with Jinja linebreak suppression, i.e.
239-
```yaml
240-
config:
241-
macros: > # this is a macro!
242-
{%- macro test() -%}
243-
testing!
244-
{%- endmacro -%}
245-
sources:
246-
...
247-
```
248-
could render down to
249-
```yaml
250-
config:
251-
macros: > # this is a macro!sources:
252-
...
253-
```
254-
which will fail with an error about no sources defined.
255-
256-
* bugfix: charset issues when reading / writing non-UTF8 files - this should be resolved by enforcing every file read/write to specify UTF8 encoding
257-
258-
</details>
259-
260-
### v0.1.3
261-
<details>
262-
<summary>Released 2023-05-05</summary>
263-
264-
* feature: implement ability to call ` {{ md5(column) }}` in Jinja throughout eathmover, with a framework for other Python functions to be added in the future
265-
* bugfix: fix multi-line macros issue
266-
267-
</details>
268-
269-
### v0.1.2
270-
<details>
271-
<summary>Released 2023-05-02</summary>
272-
273-
* bugfix: fix continued issues with environment variable expansion under Windows by changing from `os.path.expandvars()` to native Python `String.Template` implementation
274-
* bugfix: change how earthmover loads `config.macros` from YAML to prevent issues with multi-line macros definitions
275-
276-
</details>
277-
278-
### v0.1.1
279-
<details>
280-
<summary>Released 2023-03-27</summary>
281-
282-
* bugfix: a single quote in the config YAML could prevent environment variable expansion from working since `os.path.expandvars()` [does not expand variables within single quotes](https://hg.python.org/cpython/file/v2.7.3/Lib/ntpath.py#l330) in Python under Windows
283-
284-
</details>
285-
286-
### v0.1.0
287-
<details>
288-
<summary>Released 2023-03-23</summary>
289-
290-
* feature: added parse-time Jinja templating to YAML configuration
291-
292-
> :warning: **Potentially breaking change:** if your config YAML contains `add_columns` or `modify_columns` operations *with Jinja expressions*, these will now be parsed at YAML load time. To preserve the Jinja for runtime parsing, wrap the expressions with `{%raw%}...{%endraw%}`. See [YAML parsing](./README.md#yaml-parsing) for further information.
293-
294-
* feature: removed dependency on matplotlib, which is only required if your YAML specified `config.show_graph: True`... now if you try to `show_graph` without matplotlib installed, you'll get an error prompting you to install matplotlib
295-
296-
</details>
297-
298-
<hr />
299-
300-
### v0.0.7
301-
<details>
302-
<summary>Released 2023-02-23</summary>
303-
304-
* feature: added `str_min()` and `str_max()` functions for `group by` operation
305-
</details>
306-
307-
### v0.0.6
308-
<details>
309-
<summary>Released 2023-02-17</summary>
310-
311-
* feature: pass `__row_data__` dict into Jinja templates for easier dynamic column referencing
312-
* bugfix: parameter / env var interpolation into YAML keys, not just values
313-
* refactor error handling key assertion methods
314-
* refactor YAML loader line number context handling
315-
</details>
316-
317-
### v0.0.5
318-
<details>
319-
<summary>Released 2022-12-16</summary>
320-
321-
* trim nodes not connected to a destination from DAG
322-
* ensure all source datatypes return a Dask dataframe
323-
* update [optional source functionality](#optional-sources) to require `columns` list, and pass an empty dataframe through the DAG
324-
</details>
325-
326-
### v0.0.4
327-
<details>
328-
<summary>Released 2022-10-27</summary>
329-
330-
* support running in Google Colab
331-
</details>
332-
333-
### v0.0.3
334-
<details>
335-
<summary>Released 2022-10-27</summary>
336-
337-
* support for Python 3.7
338-
</details>
339-
340-
### v0.0.2
341-
<details>
342-
<summary>Released 2022-09-22</summary>
343-
344-
* initial release
345-
</details>
1+
(Changelog has moved [here](https://edanalytics.github.io/earthmover/changelog).)

0 commit comments

Comments
 (0)