Skip to content

Commit

Permalink
Python: Adding USearch memory connector (#2358)
Browse files Browse the repository at this point in the history
### Motivation and Context

The integration of [USearch](https://github.com/unum-cloud/usearch) as a
memory connector to Semantic Kernel (SK).

### Description
     
The USearch `Index` does not natively have the ability to store
different collections, and it only stores embeddings without other
attributes like `MemoryRecord`.

The `USearchMemoryStore` class encapsulates these capabilities. It uses
the USearch `Index` to store a collection of embeddings under unique
IDs, with original collection names mapped to those IDs. Other
`MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is
mapped to each collection.

It's important to note the current behavior when a user removes a record
or upserts a new one with an existing ID: the old row is not removed
from the `pyarrow.Table`. This is done for performance reasons but could
lead to the table growing in size.

By default, `USearchMemoryStore` operates as an in-memory store. To
enable persistence, you must set the persist mode with calling
appropriate `__init__ `, supplying a path to the directory for the
persist files. For each collection, two files will be created:
`{collection_name}.usearch` and `{collection_name}.parquet`. Changes
will only be dumped to the disk when `close_async` is called. Due to the
interface provided by the base class `MemoryStoreBase`, this happens
implicitly when using a context manager, or it may be called explicitly.

Since collection names are used to store files on disk, all names are
converted to lowercase.

To ensure efficient use of memory, you should call `close_async`.
---------

Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Devis Lucato <[email protected]>
  • Loading branch information
4 people authored Aug 23, 2023
1 parent 4bc5ff7 commit 3881a31
Show file tree
Hide file tree
Showing 5 changed files with 1,153 additions and 1 deletion.
116 changes: 115 additions & 1 deletion python/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ azure-search-documents = {version = "11.4.0b8", allow-prereleases = true}
azure-core = "^1.28.0"
azure-identity = "^1.13.0"

[tool.poetry.group.usearch.dependencies]
usearch = "^1.1.1"
pyarrow = "^12.0.1"

[tool.isort]
profile = "black"

Expand Down
7 changes: 7 additions & 0 deletions python/semantic_kernel/connectors/memory/usearch/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) Microsoft. All rights reserved.

from semantic_kernel.connectors.memory.usearch.usearch_memory_store import (
USearchMemoryStore,
)

__all__ = ["USearchMemoryStore"]
Loading

0 comments on commit 3881a31

Please sign in to comment.