Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.19.0
Added
LLMReranker
now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.pw.io.kafka.write
andpw.io.nats.write
now supportColumnReference
as a topic name. When aColumnReference
is provided, each message's topic is determined by the corresponding column value.pw.io.python.write
acceptingConnectorObserver
as an alternative topw.io.subscribe
.pw.io.iceberg.read
andpw.io.iceberg.write
now support S3 as data backend and AWS Glue catalog implementations.- All output connectors now support the
sort_by
field for ordering output within a single minibatch. - A new UDF executor
pw.udfs.fully_async_executor
. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time. - A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futures
method to wait for results of fully asynchronous UDFs.pw.io.deltalake.write
now supports partition columns specification.
Changed
- BREAKING: Changed the interface of
LLMReranker
, theuse_logit_bias
,cache_strategy
,retry_strategy
andkwargs
arguments are no longer supported. - BREAKING: LLMReranker no longer inherits from pw.UDF
- BREAKING:
pw.stdlib.utils.AsyncTransformer.output_table
now returns a table with columns with Future data type. pw.io.deltalake.read
can now read append-only tables without requiring explicit specification of primary key fields.
v0.18.0
Added
pw.io.postgres.write
andpw.io.postgres.write_snapshot
now handle serialization ofPyObjectWrapper
andTimedelta
properly.- New chunking options in
pathway.xpacks.llm.parsers.UnstructuredParser
- Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_duration
converting an integer into apw.Duration
.pw.Json
now supports storing datetime and duration type values in ISO format.
Changed
- BREAKING: Changed the interface of
UnstructuredParser
- BREAKING: The
Pointer
type is now serialized and deserialized as a string field in Iceberg and Delta Lake. - BREAKING: The
Bytes
type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents. - BREAKING: The
Array
type is now serialized and deserialized as an object with two fields:shape
denoting the shape of the stored multi-dimensional array andelements
denoting the elements of the flattened array. - BREAKING: Marked package as py.typed to indicate support for type hints.
Removed
- BREAKING: Removed undocumented
license_key
argument frompw.run
andpw.run_all
methods. Instead,pw.set_license_key
should be used.
v0.17.0
Added
pw.io.iceberg.read
method for reading Apache Iceberg tables into Pathway.- methods
pw.io.postgres.write
andpw.io.postgres.write_snapshot
now accept an additional argumentinit_mode
, which allows initializing the table before writing. pw.io.deltalake.read
now supports serialization and deserialization for all Pathway data types.- New parser
pathway.xpacks.llm.parsers.DoclingParser
supporting parsing of pdfs with tables and images. - Output connectors now include an optional
name
parameter. If provided, this name will appear in logs and monitoring dashboards. - Automatic naming for input and output connectors has been enhanced.
Changed
- BREAKING:
pw.io.deltalake.read
now requires explicit specification of primary key fields. - BREAKING:
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
now returns a dictionary frompw_ai_answer
endpoint. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
allows optionally returning context documents frompw_ai_answer
endpoint.- BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
- BREAKING: The
Pointer
type is now serialized to Delta Tables as raw bytes. pw.io.kafka.write
now allows to specifykey
andheaders
for JSON and CSV data formats.persistent_id
parameter in connectors has been renamed toname
. This newname
parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.- Changed names of parsers to be more consistent:
ParseUnstrutured
->UnstructuredParser
,ParseUtf8
->Utf8Parser
.ParseUnstrutured
andParseUtf8
are now deprecated.
Fixed
generate_class
method inSchema
now correctly renders columns ofUnionType
andNone
types.- a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshot
now correctly handles tables that only have primary key columns.
Removed
- BREAKING:
pw.indexing.build_sorted_index
,pw.indexing.retrieve_prev_next_values
,pw.indexing.sort_from_index
andpw.indexing.SortedIndex
are removed. Sorting is now done withpw.Table.sort
. - BREAKING: Removed deprecated methods
pw.Table.unsafe_promise_same_universe_as
,pw.Table.unsafe_promise_universes_are_pairwise_disjoint
,pw.Table.unsafe_promise_universe_is_subset_of
,pw.Table.left_join
,pw.Table.right_join
,pw.Table.outer_join
,pw.stdlib.utils.AsyncTransformer.result
. - BREAKING: Removed deprecated column
_pw_shard
in the result ofwindowby
. - BREAKING: Removed deprecated functions
pw.debug.parse_to_table
,pw.udf_async
,pw.reducers.npsum
,pw.reducers.int_sum
,pw.stdlib.utils.col.flatten_column
. - BREAKING: Removed deprecated module
pw.asynchronous
. - BREAKING: Removed deprecated access to functions from
pw.io
inpw
. - BREAKING: Removed deprecated classes
pw.UDFSync
,pw.UDFAsync
. - BREAKING: Removed class
pw.xpack.llm.parsers.OpenParse
. It's functionality has been replaced withpw.xpack.llm.parsers.DoclingParser
. - BREAKING: Removed deprecated arguments from input connectors:
value_columns
,primary_key
,types
,default_values
. Schema should be used instead.
v0.16.4
Fixed
- Google Drive connector in static mode now correctly displays in jupyter visualizations.
v0.16.3
Added
pw.io.iceberg.write
method for writing Pathway tables into Apache Iceberg.
Changed
- values of non-deterministic UDFs are not stored in tables that are
append_only
. pw.Table.ix
has better runtime error message that includes id of the missing row.
Fixed
- temporal behaviors in temporal operators (
windowby
,interval_join
) now consume no CPU when no data passes through them.
v0.16.2
Added
pw.xpacks.llm.prompts.RAGPromptTemplate
, set of prompt utilities that enable verifying templates and creating UDFs from prompt strings or callables.pw.xpacks.llm.question_answering.BaseContextProcessor
streamlines development and tuning of representing retrieved context documents to the LLM.pw.io.kafka.read
now supportswith_metadata
flag, which makes it possible to attach the metadata of the Kafka messages to the table entries.pw.io.deltalake.read
can now stream the tables with deletions, if no deletion vectors were used.
Changed
pw.io.sharepoint.read
now explicitly terminates with an error if it fails to read the data the specified number of times per row (the default is8
).pw.xpacks.llm.prompts.prompt_qa
, and other prompts expect 'context' and 'query' fields instead of 'docs'.- Removed support for
short_prompt_template
andlong_prompt_template
inpw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
. These prompt variants are no longer accepted during construction or in requests. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
allows setting user created prompts. Templates are verified to include 'context' and 'query' placeholders.pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
can take aBaseContextProcessor
that represents context documents to the LLM. Defaults topw.xpacks.llm.question_answering.SimpleContextProcessor
which filters metadata fields and joins the documents with new lines.
Fixed
- The input of
pw.io.fs.read
andpw.io.s3.read
is now correctly persisted in case deletions or modifications of already processed objects take place.
v0.16.1
Changed
pw.io.s3.read
now monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.pw.io.s3.read
now supportswith_metadata
flag, which makes it possible to attach the metadata of the source object to the table entries.
Fixed
pw.xpacks.llm.document_store.DocumentStore
no longer requires_metadata
column in the input table.
v0.16.0
Changelog
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning.
[Unreleased]
[0.16.0] - 2024-11-29
Added
pw.xpacks.llm.document_store.SlidesDocumentStore
, which is a subclass ofpw.xpacks.llm.document_store.DocumentStore
customized for retrieving slides from presentations.pw.temporal.inactivity_detection
andpw.temporal.utc_now
functions allowing for alerting and other time dependent usecases
Changed
pw.Table.concat
,pw.Table.with_id
,pw.Table.with_id_from
no longer perform checks if ids are unique. It improves memory usage.- table operations that store values (like
pw.Table.join
,pw.Table.update_cells
) no longer store columns that are not used downstream. append_only
column property is now propagated better (there are more places where we can infer it).- BREAKING: Unused arguments from the constructor
pw.xpacks.llm.question_answering.DeckRetriever
are no longer accepted.
Fixed
query_as_of_now
ofpw.stdlib.indexing.DataIndex
andpw.stdlib.indexing.HybridIndex
now work in constant memory for infinite query stream (no query-related data is kept after query is answered).
v0.15.4
Added
pw.io.kafka.read
now supports reading entries starting from a specified timestamp.pw.io.nats.read
andpw.io.nats.write
methods for reading from and writing Pathway tables to NATS.
Changed
pw.Table.diff
now supports settinginstance
parameter that allows computing differences for multiple groups.pw.io.postgres.write_snapshot
now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.
Fixed
pw.PyObjectWrapper
is now picklable.
v0.15.3
Added
pw.io.mongodb.write
connector for writing Pathway tables in MongoDB.pw.io.s3.read
now supports downloading objects from an S3 bucket in parallel.
Changed
pw.io.fs.read
performance has been improved for directories containing a large number of files.