Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.16.1
Changed
pw.io.s3.read
now monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.pw.io.s3.read
now supportswith_metadata
flag, which makes it possible to attach the metadata of the source object to the table entries.
Fixed
pw.xpacks.llm.document_store.DocumentStore
no longer requires_metadata
column in the input table.
v0.16.0
Changelog
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning.
[Unreleased]
[0.16.0] - 2024-11-29
Added
pw.xpacks.llm.document_store.SlidesDocumentStore
, which is a subclass ofpw.xpacks.llm.document_store.DocumentStore
customized for retrieving slides from presentations.pw.temporal.inactivity_detection
andpw.temporal.utc_now
functions allowing for alerting and other time dependent usecases
Changed
pw.Table.concat
,pw.Table.with_id
,pw.Table.with_id_from
no longer perform checks if ids are unique. It improves memory usage.- table operations that store values (like
pw.Table.join
,pw.Table.update_cells
) no longer store columns that are not used downstream. append_only
column property is now propagated better (there are more places where we can infer it).- BREAKING: Unused arguments from the constructor
pw.xpacks.llm.question_answering.DeckRetriever
are no longer accepted.
Fixed
query_as_of_now
ofpw.stdlib.indexing.DataIndex
andpw.stdlib.indexing.HybridIndex
now work in constant memory for infinite query stream (no query-related data is kept after query is answered).
v0.15.4
Added
pw.io.kafka.read
now supports reading entries starting from a specified timestamp.pw.io.nats.read
andpw.io.nats.write
methods for reading from and writing Pathway tables to NATS.
Changed
pw.Table.diff
now supports settinginstance
parameter that allows computing differences for multiple groups.pw.io.postgres.write_snapshot
now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.
Fixed
pw.PyObjectWrapper
is now picklable.
v0.15.3
Added
pw.io.mongodb.write
connector for writing Pathway tables in MongoDB.pw.io.s3.read
now supports downloading objects from an S3 bucket in parallel.
Changed
pw.io.fs.read
performance has been improved for directories containing a large number of files.
v0.15.2
Added
pw.io.deltalake.read
now supports custom S3 Delta Lakes with HTTP endpoints.pw.io.deltalake.read
now supports specifying both a custom endpoint and a custom region for Delta Lakes viapw.io.s3.AwsS3Settings
.
Changed
- Indices in
pathway.stdlib.indexing.nearest_neighbors
can now work also on numpy arrays. Previously they only acceptedlist[float]
. Working with numpy arrays improves memory efficiency. pw.io.s3.read
has been optimized to minimize new object requests whenever possible.- It is now possible to set the size limit of cache in
pw.udfs.DiskCache
. - State persistence now uses a single backend for both metadata and stream storage. The
pw.persistence.Config.simple_config
method is therefore deprecated. Now you can use thepw.persistence.Config
constructor with the same parameters that were previously used insimple_config
.
Fixed
pw.io.bigquery.write
connector now correctly handlespw.Json
columns.
v0.15.1
Fixed
pw.temporal.session
andpw.temporal.asof_join
now correctly works with multiple entries with the same time.- Fixed an issue in
pw.stdlib.indexing
where filters would cause runtime errors while usingHybridIndexFactory
.
v0.15.0
Added
- Experimental A
pw.xpacks.llm.document_store.DocumentStore
to process and index documents. pw.xpacks.llm.servers.DocumentStoreServer
used to expose REST server for retrieving documents frompw.xpacks.llm.document_store.DocumentStore
.pw.xpacks.stdlib.indexing.HybridIndex
used for querying multiple indices and combining their results.pw.io.airbyte.read
now also supports streams that only operate infull_refresh
mode.
Changed
- Running servers for answering queries is extracted from
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
intopw.xpacks.llm.servers.QARestServer
andpw.xpacks.llm.servers.QASummaryRestServer
. - BREAKING:
query
andquery_as_of_now
ofpathway.stdlib.indexing.data_index.DataIndex
now produce an empty list instead ofNone
if no match is found
v0.14.3
Fixed
pw.io.deltalake.read
andpw.io.deltalake.write
now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.
Added
- The Pathway CLI command
spawn
can now execute code directly from a specified GitHub repository. - A new CLI command,
spawn-from-env
, has been added. This command runs the Pathway CLIspawn
command using arguments provided in thePATHWAY_SPAWN_ARGS
environment variable.
v0.14.2
Fixed
- Switched
pw.xpacks.llm.embedders.GeminiEmbedder
to be sync to resolve compatibility issues with the Google Colab runs. - Pinned
surya-ocr
module version for stability.
v0.14.1
Added
pw.xpacks.llm.embedders.GeminiEmbedder
which is a wrapper for Google Gemini Embedding services.