Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Python: Adding USearch memory connector (#2358)
### Motivation and Context The integration of [USearch](https://github.com/unum-cloud/usearch) as a memory connector to Semantic Kernel (SK). ### Description The USearch `Index` does not natively have the ability to store different collections, and it only stores embeddings without other attributes like `MemoryRecord`. The `USearchMemoryStore` class encapsulates these capabilities. It uses the USearch `Index` to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other `MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is mapped to each collection. It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the `pyarrow.Table`. This is done for performance reasons but could lead to the table growing in size. By default, `USearchMemoryStore` operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate `__init__ `, supplying a path to the directory for the persist files. For each collection, two files will be created: `{collection_name}.usearch` and `{collection_name}.parquet`. Changes will only be dumped to the disk when `close_async` is called. Due to the interface provided by the base class `MemoryStoreBase`, this happens implicitly when using a context manager, or it may be called explicitly. Since collection names are used to store files on disk, all names are converted to lowercase. To ensure efficient use of memory, you should call `close_async`. --------- Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Devis Lucato <[email protected]>
- Loading branch information