Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.12.0
Added
pw.PyObjectWrapper
that enables passing python objects of any type to the engine.cache_strategy
option added forpw.io.http.rest_connector
. It enables cache configuration, which is useful for duplicated requests.allow_misses
argument toTable.ix
andTable.ix_ref
methods which allows for filling rows with missing keys with None values.pw.io.deltalake.write
output connector that streams the changes of a given table into a DeltaLake storage.pw.io.airbyte.read
now supports data extraction with Google Cloud Runs.
Removed
- BREAKING: Removed
Table.having
method. - BREAKING: Removed
pw.DATE_TIME_UTC
,pw.DATE_TIME_NAIVE
andpw.DURATION
as dtype markers. Instead,pw.DateTimeUtc
,pw.DateTimeNaive
andpw.Duration
should be used, which are wrappers for corresponding pandas types. - BREAKING: Removed class transformers from public API:
pw.ClassArg
,pw.attribute
,pw.input_attribute
,pw.input_method
,pw.method
,pw.output_attribute
andpw.transformer
. - BREAKING: Removed several methods from
pw.indexing
module:binsearch_oracle
,filter_cmp_helper
,filter_smallest_k
andprefix_sum_oracle
.
v0.11.2
Added
pathway.assert_table_has_schema
andpathway.table_transformer
now acceptallow_subtype
argument, which, if True, allows column types in the Table be subtypes of types in the Schema.next
method topw.io.python.ConnectorSubject
(python connector) that enables passing values of any type to the engine, not only values that are json-serializable. Thenext
method should be the preferred way of passing values from the python connector.
Changed
- The
format
argument ofpw.io.python.read
is deprecated. A data format is inferred from the method used (next_json
,next_str
,next_bytes
) and the provided schema.
Removed
- Removed
pw.numba_apply
andnumba
dependency.
Fixed
- Fixed
pw.this
desugaring bug, where__getitem__
in.ix
context was not working properly. pw.io.sqlite.read
now checks if the data matches the passed schema.
v0.11.1
Added
query
andquery_as_of_now
ofpathway.stdlib.indexing.data_index.DataIndex
now accept inmetadata_column
parameter a column with data of typestr | None
.pathway.xpacks.connectors.sharepoint
module under Pathway for Business License.
v0.11.0
Added
- Embedders in the LLM xpack now have method
get_embedding_dimension
that returns number of dimension used by the chosen embedder. pathway.stdlib.indexing.nearest_neighbors
, with implementations ofpathway.stdlib.indexing.data_index.InnerIndex
based on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.pathway.stdlib.indexing.vector_document_index
, with a few predefined instances ofpathway.stdlib.indexing.data_index.DataIndex
.pathway.stdlib.indexing.bm25
, with implementations ofpathway.stdlib.indexing.data_index.InnerIndex
based on BM25 index provided by Tantivy.pathway.stdlib.indexing.full_text_document_index
, with a predefined instance ofpathway.stdlib.indexing.data_index.DataIndex
.- Introduced the
reranker
module underllm.xpacks
. Includes few re-ranking strategies and utility functions for RAG applications.
Changed
- BREAKING:
windowby
generates IDs of produced rows differently than in the previous version. - BREAKING:
pw.io.csv.write
prints printable non-ascii characters as regular text, not\u{xxxx}
. - BREAKING: Connector methods
pw.io.elasticsearch.read
,pw.io.debezium.read
,pw.io.fs.read
,pw.io.jsonlines.read
,pw.io.kafka.read
,pw.io.python.read
,pw.io.redpanda.read
,pw.io.s3.read
now check the type of the input data. Previously it was not checked if the provided format was"json"
/"jsonlines"
. If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted. - BREAKING:
query
andquery_as_of_now
methods ofpathway.stdlib.indexing.data_index.DataIndex
now returnpathway.JoinResult
, to allow resolving column name conflicts (between columns in the table with queries and table with index data). - BREAKING: DataIndex methods
query
andquery_as_of_now
now return score in a column named_pw_index_reply_score
(defined as_SCORE
variable inpathway.stdlib.indexing.colnames.py
).
Removed
- BREAKING:
pathway.stdlib.indexing.data_index.VectorDocumentIndex
class, some predefined instances are now meant to be obtained via methods provided inpathway.stdlib.indexing.vector_document_index
. - BREAKING:
with_distances
parameter ofquery
andquery_as_of_now
methods inpathway.stdlib.indexing.data_index.DataIndex
. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.
v0.10.1
Added
query
method to VectorStoreServer to enable compatible API withDataIndex
.AdaptiveRAGQuestionAnswerer
to xpacks.question_answering. End-to-end pipeline and accompanying code forPrivate RAG
showcase.
v0.10.0
Added
- Pathway now warns when unintentionally creating Table with empty universe.
pw.io.kafka.write
inraw
andplaintext
formats now supports output for tables with multiple columns. For such tables, it requires the specification of the column that must be used as a value of the produced Kafka messages and gives a possibility to provide column which must be used as a key.pw.io.kafka.write
can now output values from the table using Kafka message headers in 'raw' and 'plaintext' output format.
Changed
instance
arguments togroupby
,join
,with_id_from
now determine how entries are distributed between machines.flatten
results remain on the same machine as their source entries.join
sends each record between machines at most once.- BREAKING:
flatten
,join
,groupby
(if used withinstance
),with_id_from
(if used withinstance
) generate IDs of the produced rows differently than in the previous versions. pathway spawn
with multiple workers prints only output from the first worker.
v0.9.0
Added
pw.reducers.latest
andpw.reducers.earliest
that return the value with respectively maximal and minimal processing time assigned.pw.io.kafka.write
can now produce messages containing raw bytes in case the table consists of a single binary column andraw
mode is specified. Similarly, this method will provide plaintext messages ifplaintext
mode is chosen and the table consists of a single string-typed column.pw.io.pubsub.write
connector for publishing Pathway tables into Google PubSub.- Argument
strict_prompt
toanswer_with_geometric_rag_strategy
andanswer_with_geometric_rag_strategy_from_index
that allows optimizing prompts for smaller open-source LLM models. - Temporarily switch LiteLLMChat's generation method to sync version due to a bug while using
json
mode with Ollama.
Changed
- BREAKING:
pw.io.kafka.read
will not parse the messages from UTF-8 in caseraw
mode was specified. To preserve this behavior you can use theplaintext
mode. - BREAKING:
Table.flatten
now flattens one column and spreads every other column of the table, instead of taking other columns from the argument list.
v0.8.6
Added
pw.io.bigquery.write
connector for writing Pathway tables into Google BigQuery.- parameter
filepath_globpattern
toquery
method inVectorStoreClient
for specifying which files should be considered in the query. - Improved compatibility of
pw.Json
with standard methods such aslen()
,int()
,float()
,bool()
,iter()
,reversed()
when feasible.
Changed
pw.io.postgres.write
can now parallelize writes to several threads if several workers are configured.- Pathway now checks types of pointers rigorously. Indexing table with mismatched number/types of columns vs what was used to create index will now result in a TypeError.
pw.Json.as_float()
method now supports integer JSON values.
v0.8.5
Added
- New function
answer_with_geometric_rag_strategy_from_index
, which allows to useanswer_with_geometric_rag_strategy
without the need to first retrieve documents from index. - Added support for custom state serialization to
udf_reducer
. - Introduced
instance
parameter inAsyncTransformer
. All calls with a given(instance, processing_time)
pair are returned at the same processing time. Ordering is preserved within a single instance. - Added
successful
,failed
,finished
properties toAsyncTransformer
. They return tables with successful calls, failed calls and all finished calls, respectively.
Changed
- Property
result
ofAsyncTransformer
is deprecated. Propertysuccessful
should be used instead. pw.io.csv.read
,pw.io.jsonlines.read
,pw.io.fs.read
,pw.io.plaintext.read
now handlepath
as a glob pattern and read all matched files and directories recursively.
v0.8.4
Fixed
- Pathway will only require
LiteLLM
package, if you use one of the wrappers forLiteLLM
. - Retries are implemented in
pw.io.airbyte.read
. - State processing protocol is updated in
pw.io.airbyte.read
.