- 543 Support type hinting with standard collections
- 544 Fix Spark connect import issue on worker side
- 482 Move Fugue SQL dependencies into extra
[sql]
and functions to become soft dependencies - 504 Create Fugue pytest fixtures and plugins
- 541 Change table temp view names to uppercase
- 540 Fix Ray 2.10+ compatibility issues
- 539 Fix compatibility issues with Dask 2024.4+
- 534 Remove ibis version cap
- 505 Deprecate
as_ibis
in FugueWorkflow - 387 Improve test coverage on 3.10, add tests for 3.11
- 269 Spark and Dask Take 1 row without sorting optimization
- 488 Migrate from fs to fsspec
- 521 Add
as_dicts
to Fugue API - 516 Use
_collect_as_arrow
for Spark `as_arrow`` - 520 Add Python 3.10 to Windows Tests
- 506 Adopt pandas
ExtensionDType
- 504 Create Fugue pytest fixtures
- 503 Deprecate python 3.7 support
- 501 Simplify zip/comap, remove join from the implementation
- 500 Implement all partitioning strategies for Dask
- 495 Resolve segfault on Duckdb 0.8.1
- 494 Remove the version cap of Dask
- 497 Make LocalExecutionEngine respect partition numbers
- 493 Spark Pandas UDF partitioning improvement
- 492 Made AnyDataFrame recognized by Creator, Processor and Ouputter
- 490 Fixed pa.Table as transformer output bug
- 489 Added version cap to Ibis
- 485 Made Fugue compatible with Ray 2.5.0
- 486 Added py.typed to Fugue
- 481 Moved Fugue SQL dependencies into functions as soft dependencies
- 478 Removed cloudpickle from the hard dependency of Spark backend
- 477 Removed tests folder from Fugue package
- 476 Fix compatibility issues for Pandas 2+ and Spark < 3.4
- 471 Fix compatibility issues for duckdb 0.8.0+
- 466 Fix Ray 2.4.0 compatibility issue
- 464 Support for spark/databricks connect
- 459 DEPRECATION: Avro support
- 455 Make Fugue pandas 2 compatible
- 430 Support Polars DataFrames
- 434 Make Transformations data format aware
- 408 Remove SQLite support
- 444 Clean up FunctionWrapper
- 423 Add seaborn as a domain level extension for visualization
- 422 Add pandas_df.plot as the first namespace extension
- 421 Add the namespace concept to Fugue extensions
- 420 Add is_distributed to engines
- 419 Log transpiled SQL query upon error
- 384 Expanding Fugue API
- 410 Unify Fugue SQL dialect (syntax only)
- 409 Support arbitrary column names in Fugue
- 404 Ray/Dask engines guess optimal default partitions
- 403 Deprecate register_raw_df_type
- 392 Aggregations on Spark dataframes fail intermittently
- 398 Rework API Docs and Favicon
- 393 ExecutionEngine as_context
- 385 Remove DataFrame metadata
- 381 Change SparkExecutionEngine to use pandas udf by default
- 380 Refactor ExecutionEngine (Separate out MapEngine)
- 378 Refactor DataFrame show
- 377 Create bag
- 372 Infer execution engine from input
- 340 Migrate to plugin mode
- 369 Remove execution from FugueWorkflow context manager, remove engine from FugueWorkflow
- 373 Fixed Spark engine rename slowness when there are a lot of columns
- 362 Remove Python 3.6 Support
- 363 Create IbisDataFrame and IbisExecutionEngine
- 364 Enable Map type support
- 365 Support column names starting with numbers
- 361 Better error message for cross join
- 345: Enabled file as input/output for transform and out_transform
- 326: Added tests for Python 3.6 - 3.10 for Linux and 3.7 - 3.9 for Windows. Updated devenv and CICD to Python 3.8.
- 321: Moved out Fugue SQL to https://github.com/fugue-project/fugue-sql-antlr, removed version cap of
antlr4-python3-runtime
- 323: Removed version cap of DuckDB
- 334: Replaced RLock with SerializableRLock
- 337: Fixed index warning in fugue_dask
- 339: Migrated execution engine parsing to triad conditional_dispatcher
- 341: Added Dask Client to DaskExecutionEngine, and fixed bugs of Dask and Duckdb
- Create a hybrid engine of DuckDB and Dask
- Save Spark-like partitioned parquet files for all engines
- Enable DaskExecutionEngine to transform dataframes with nested columns
- A smarter way to determine default npartitions in Dask
- Support even partitioning on Dask
- Add handling of nested ArrayType on Spark
- Change to plugin approach to avoid explicit import
- Fixed Click version issue
- Added version caps for antlr4-python3-runtime and duckdb as they both released new versions with breaking changes.
- Make Fugue exceptions short and useful
- Ibis integration (experimental)
- Get rid of simple assignment (not used at all)
- Improve DuckDB engine to use a real DuckDB ExecutionEngine
- YIELD LOCAL DATAFRAME
- Add an option to transform to turn off native dataframe output
- Add callback parameter to
transform
andout_transform
- Support DuckDB
- Create fsql_ignore_case for convenience, make this an option in notebook setup
- Make Fugue SQL error more informative about case issue
- Enable pandas default SQL engine (QPD) to take lower case SQL
- Change pickle to cloudpickle for Flask RPC Server
- Add license to package
- Parsed arbitrary object into execution engine
- Made Fugue SQL accept
+
,~
,-
in schema expression - Fixed transform bug for Fugue DataFrames
- Fixed a very rare bug of annotation parsing
- Added Select, Aggregate, Filter, Assign interfaces
- Made compatible with Windows OS, added github actions to test on windows
- Register built-in extensions
- Accept platform dependent annotations for dataframes and execution engines
- Let SparkExecutionEngine accept empty pandas dataframes
- Move to codecov
- Let Fugue SQL take input dataframes with name such as a.b
- Dask repartitioning improvement
- Separate Dask IO to use its own APIs
- Improved Dask print function by adding back head
- Made
assert_or_throw
lazy - Improved notebook setup handling for jupyter lab
- HOTFIX avro support
- Added built in avro support
- Fixed dask print bug
- Added Codacy and Slack channel badges, fixed pylint
- Created transform and out_transform functions
- Added partition syntax sugar
- Fixed FugueSQL
CONNECT
bug
- Fugueless 1 2 3 4 5
- Notebook experience and extension 1 2
- NativeExecutionEngine: switched to use QPD for SQL
- Spark pandas udf: migrate to applyInPandas and mapInPandas
- SparkExecutionEngine take bug
- Fugue SQL: PRINT ROWS n -> PRINT n ROWS|ROW
- Refactor yield
- Fixed Jinja templating issue
- Change _parse_presort_exp from a private function to public
- Failure to delete execution temp directory is annoying was changed to info
- Limit and Limit by Partition
- README code is working now
- Limit was renamed to take and added to SQL interface
- RPC for Callbacks to collect information from workers in real time
- Changes in handling input dataframe determinism. This fixes a bug related to thread locks with Spark DataFrames because of a deepcopy.
- sample function
- Make csv infer schema consistent cross engine
- Make loading file more consistent cross engine
- Support **kwargs in interfaceless extensions, see this
- Support
Iterable[pd.DataFrame]
as output type, see this - Alter column types
- RENAME in Fugue SQL
- CONNECT different SQL service in Fugue SQL
- Fixed Spark EVEN REPARTITION issue
- Add hook to print/show, see this.
- Fixed import issue with OutputTransformer
- Added fillna as a built-in transform, including SQL implementation
- Extension validation interface and interfaceless syntax
- Passing dataframes cross workflow (yield)
- OUT TRANSFORM to transform and finish a branch of execution
- Fixed a PandasDataFrame datetime issue that only happened in transformer interface approach
- Unified checkpoints and persist
- Drop columns and na implementations in both programming and sql interfaces
- Presort takes array as input
- Fixed jinja template rendering issue
- Fixed path format detection bug
- Require pandas 1.0 because of parquet schema
- Improved Fugue SQL extension parsing logic
- Doc for contributors to setup their environment
- Added set operations to programming interface:
union
,subtract
,intersect
- Added
distinct
to programming interface - Ensured partitioning follows SQL convention: groups with null keys are NOT removed
- Switched
join
,union
,subtract
,intersect
,distinct
to QPD implementations, so they follow SQL convention - Set operations in Fugue SQL can directly operate on Fugue statemens (e.g.
TRANSFORM USING t1 UNION TRANSFORM USING t2
) - Fixed bugs
- Added onboarding document for contributors
- Main features of Fugue core and Fugue SQL
- Support backends: Pandas, Spark and Dask