20 Feb 13:12

github-actions

2118468

v0.19.0 Latest

Latest

Added

LLMReranker now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.
pw.io.kafka.write and pw.io.nats.write now support ColumnReference as a topic name. When a ColumnReference is provided, each message's topic is determined by the corresponding column value.
pw.io.python.write accepting ConnectorObserver as an alternative to pw.io.subscribe.
pw.io.iceberg.read and pw.io.iceberg.write now support S3 as data backend and AWS Glue catalog implementations.
All output connectors now support the sort_by field for ordering output within a single minibatch.
A new UDF executor pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time.
A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futures method to wait for results of fully asynchronous UDFs.
pw.io.deltalake.write now supports partition columns specification.

Changed

BREAKING: Changed the interface of LLMReranker, the use_logit_bias, cache_strategy, retry_strategy and kwargs arguments are no longer supported.
BREAKING: LLMReranker no longer inherits from pw.UDF
BREAKING: pw.stdlib.utils.AsyncTransformer.output_table now returns a table with columns with Future data type.
pw.io.deltalake.read can now read append-only tables without requiring explicit specification of primary key fields.

Assets 6

07 Feb 16:10

github-actions

v0.18.0

51a5660

v0.18.0

Added

pw.io.postgres.write and pw.io.postgres.write_snapshot now handle serialization of PyObjectWrapper and Timedelta properly.
New chunking options in pathway.xpacks.llm.parsers.UnstructuredParser
Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_duration converting an integer into a pw.Duration.
pw.Json now supports storing datetime and duration type values in ISO format.

Changed

BREAKING: Changed the interface of UnstructuredParser
BREAKING: The Pointer type is now serialized and deserialized as a string field in Iceberg and Delta Lake.
BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents.
BREAKING: The Array type is now serialized and deserialized as an object with two fields: shape denoting the shape of the stored multi-dimensional array and elements denoting the elements of the flattened array.
BREAKING: Marked package as py.typed to indicate support for type hints.

Removed

BREAKING: Removed undocumented license_key argument from pw.run and pw.run_all methods. Instead, pw.set_license_key should be used.

Assets 6

31 Jan 12:07

github-actions

v0.17.0

36b9ec2

v0.17.0

Added

pw.io.iceberg.read method for reading Apache Iceberg tables into Pathway.
methods pw.io.postgres.write and pw.io.postgres.write_snapshot now accept an additional argument init_mode, which allows initializing the table before writing.
pw.io.deltalake.read now supports serialization and deserialization for all Pathway data types.
New parser pathway.xpacks.llm.parsers.DoclingParser supporting parsing of pdfs with tables and images.
Output connectors now include an optional name parameter. If provided, this name will appear in logs and monitoring dashboards.
Automatic naming for input and output connectors has been enhanced.

Changed

BREAKING: pw.io.deltalake.read now requires explicit specification of primary key fields.
BREAKING: pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now returns a dictionary from pw_ai_answer endpoint.
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows optionally returning context documents from pw_ai_answer endpoint.
BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
BREAKING: The Pointer type is now serialized to Delta Tables as raw bytes.
pw.io.kafka.write now allows to specify key and headers for JSON and CSV data formats.
persistent_id parameter in connectors has been renamed to name. This new name parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
Changed names of parsers to be more consistent: ParseUnstrutured -> UnstructuredParser, ParseUtf8 -> Utf8Parser. ParseUnstrutured and ParseUtf8 are now deprecated.

Fixed

generate_class method in Schema now correctly renders columns of UnionType and None types.
a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshot now correctly handles tables that only have primary key columns.

Removed

BREAKING: pw.indexing.build_sorted_index, pw.indexing.retrieve_prev_next_values, pw.indexing.sort_from_index and pw.indexing.SortedIndex are removed. Sorting is now done with pw.Table.sort.
BREAKING: Removed deprecated methods pw.Table.unsafe_promise_same_universe_as, pw.Table.unsafe_promise_universes_are_pairwise_disjoint, pw.Table.unsafe_promise_universe_is_subset_of, pw.Table.left_join, pw.Table.right_join, pw.Table.outer_join, pw.stdlib.utils.AsyncTransformer.result.
BREAKING: Removed deprecated column _pw_shard in the result of windowby.
BREAKING: Removed deprecated functions pw.debug.parse_to_table, pw.udf_async, pw.reducers.npsum, pw.reducers.int_sum, pw.stdlib.utils.col.flatten_column.
BREAKING: Removed deprecated module pw.asynchronous.
BREAKING: Removed deprecated access to functions from pw.io in pw.
BREAKING: Removed deprecated classes pw.UDFSync, pw.UDFAsync.
BREAKING: Removed class pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced with pw.xpack.llm.parsers.DoclingParser.
BREAKING: Removed deprecated arguments from input connectors: value_columns, primary_key, types, default_values. Schema should be used instead.

Assets 6

09 Jan 15:14

github-actions

v0.16.4

5d30c34

v0.16.4

Fixed

Google Drive connector in static mode now correctly displays in jupyter visualizations.

Assets 5

02 Jan 14:38

github-actions

v0.16.3

eb36786

v0.16.3

Added

pw.io.iceberg.write method for writing Pathway tables into Apache Iceberg.

Changed

values of non-deterministic UDFs are not stored in tables that are append_only.
pw.Table.ix has better runtime error message that includes id of the missing row.

Fixed

temporal behaviors in temporal operators (windowby, interval_join) now consume no CPU when no data passes through them.

Assets 5

02 Jan 14:37

github-actions

v0.16.2

fa09e1c

v0.16.2

Added

pw.xpacks.llm.prompts.RAGPromptTemplate, set of prompt utilities that enable verifying templates and creating UDFs from prompt strings or callables.
pw.xpacks.llm.question_answering.BaseContextProcessor streamlines development and tuning of representing retrieved context documents to the LLM.
pw.io.kafka.read now supports with_metadata flag, which makes it possible to attach the metadata of the Kafka messages to the table entries.
pw.io.deltalake.read can now stream the tables with deletions, if no deletion vectors were used.

Changed

pw.io.sharepoint.read now explicitly terminates with an error if it fails to read the data the specified number of times per row (the default is 8).
pw.xpacks.llm.prompts.prompt_qa, and other prompts expect 'context' and 'query' fields instead of 'docs'.
Removed support for short_prompt_template and long_prompt_template in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer. These prompt variants are no longer accepted during construction or in requests.
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows setting user created prompts. Templates are verified to include 'context' and 'query' placeholders.
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer can take a BaseContextProcessor that represents context documents to the LLM. Defaults to pw.xpacks.llm.question_answering.SimpleContextProcessor which filters metadata fields and joins the documents with new lines.

Fixed

The input of pw.io.fs.read and pw.io.s3.read is now correctly persisted in case deletions or modifications of already processed objects take place.

Assets 5

12 Dec 15:09

github-actions

v0.16.1

545e0e6

v0.16.1

Changed

pw.io.s3.read now monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.
pw.io.s3.read now supports with_metadata flag, which makes it possible to attach the metadata of the source object to the table entries.

Fixed

pw.xpacks.llm.document_store.DocumentStore no longer requires _metadata column in the input table.

Assets 5

29 Nov 10:49

github-actions

v0.16.0

3ba06ce

v0.16.0

Changelog

All notable changes to this project will be documented in this file.

This project adheres to Semantic Versioning.

[Unreleased]

[0.16.0] - 2024-11-29

Added

pw.xpacks.llm.document_store.SlidesDocumentStore, which is a subclass of pw.xpacks.llm.document_store.DocumentStore customized for retrieving slides from presentations.
pw.temporal.inactivity_detection and pw.temporal.utc_now functions allowing for alerting and other time dependent usecases

Changed

pw.Table.concat, pw.Table.with_id, pw.Table.with_id_from no longer perform checks if ids are unique. It improves memory usage.
table operations that store values (like pw.Table.join, pw.Table.update_cells) no longer store columns that are not used downstream.
append_only column property is now propagated better (there are more places where we can infer it).
BREAKING: Unused arguments from the constructor pw.xpacks.llm.question_answering.DeckRetriever are no longer accepted.

Fixed

query_as_of_now of pw.stdlib.indexing.DataIndex and pw.stdlib.indexing.HybridIndex now work in constant memory for infinite query stream (no query-related data is kept after query is answered).

Assets 5

18 Nov 20:52

github-actions

v0.15.4

f1e8f77

v0.15.4

Added

pw.io.kafka.read now supports reading entries starting from a specified timestamp.
pw.io.nats.read and pw.io.nats.write methods for reading from and writing Pathway tables to NATS.

Changed

pw.Table.diff now supports setting instance parameter that allows computing differences for multiple groups.
pw.io.postgres.write_snapshot now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.

Fixed

pw.PyObjectWrapper is now picklable.

Assets 5

07 Nov 07:12

github-actions

v0.15.3

ae97e55

v0.15.3

Added

pw.io.mongodb.write connector for writing Pathway tables in MongoDB.
pw.io.s3.read now supports downloading objects from an S3 bucket in parallel.

Changed

pw.io.fs.read performance has been improved for directories containing a large number of files.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Added

Changed

Removed

Added

Changed

Fixed

Removed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Changed

Fixed

Changelog

[Unreleased]

[0.16.0] - 2024-11-29

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Releases: pathwaycom/pathway

v0.19.0

Added

Changed

v0.18.0

Added

Changed

Removed

v0.17.0

Added

Changed

Fixed

Removed

v0.16.4

Fixed

v0.16.3

Added

Changed

Fixed

v0.16.2

Added

Changed

Fixed

v0.16.1

Changed

Fixed

v0.16.0

Changelog

[Unreleased]

[0.16.0] - 2024-11-29

Added

Changed

Fixed

v0.15.4

Added

Changed

Fixed

v0.15.3

Added

Changed