Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to the weaviate datatypes support (text[], object, object[]) with WeaviateDocumentIndex #1849

Open
4 of 7 tasks
vincetrep opened this issue Jan 29, 2024 · 6 comments · May be fixed by #1852
Open
4 of 7 tasks

Comments

@vincetrep
Copy link

vincetrep commented Jan 29, 2024

Initial Checks

  • I have searched Google & GitHub for similar requests and couldn't find anything
  • I have read and followed the docs and still think this feature is missing

Description

There is a limitation with the translation of docarray data types with the data types in weaviate.

Currently the list of datatypes is limited to the list included here:
docarray/index/backends/weaviate.py line 247

        default_column_config: Dict[Any, Dict[str, Any]] = field(
            default_factory=lambda: {
                np.ndarray: {},
                docarray.typing.ID: {},
                'string': {},
                'text': {},
                'int': {},
                'number': {},
                'boolean': {},
                'number[]': {},
                'blob': {},

line 710

        py_weaviate_type_map = {
            docarray.typing.ID: 'string',
            str: 'text',
            int: 'int',
            float: 'number',
            bool: 'boolean',
            np.ndarray: 'number[]',
            bytes: 'blob',
        }

line 197 create_schema

  • would need to accommodate the new data types in the schema creation.

The lists outlined above are more limited than the supported data types in weaviate:
https://weaviate.io/developers/weaviate/config-refs/datatypes

We are looking to support text[] -> list of strings, object and object[] data types in order to fully leverage the weaviate storage.

One of the motivations is to have simpler data storage and also to be able to make use of weaviate's new filters : ContainsAny, ContainsAll - https://weaviate.io/developers/weaviate/api/graphql/filters#filter-structure
At the moment we need to serialize our array as a string and use Like operators which is not ideal and detrimental to performance of queries.

These filters are essentially to find conditions within arrays of values. These data types (object[] and text[] aren't supported when indexing data with the WeaviateDocumentIndex.

Affected Components

@JoanFM
Copy link
Member

JoanFM commented Jan 29, 2024

Can u give an example of the Doc structure u plan to use and how u would interact with the Index?

@vincetrep
Copy link
Author

vincetrep commented Jan 29, 2024

Sure:

class SampleDoc(BaseDoc):
    allergies: Optional[List[str]] = []

This object should result into a text[] data type in the WeaviateDocumentIndex.

let's assume a value of ['glutenfree','peanutfree'] in the array.

Then I would be able to do a filter with ContainsAny or ContainsAll on that Field within my search or filter request to find all objects which contain all of the allergenfree terms:

            filter = {
                "path":["allergies"],
                "operator": "ContainsAll",
                "valueText": ['glutenfree','peanutfree']
            }

@JoanFM
Copy link
Member

JoanFM commented Jan 29, 2024

The point is right now that this type is cureently only handled as blob and cannot be filtered upon?

@hsm207
Copy link
Collaborator

hsm207 commented Jan 29, 2024

FWIW, after the initial release of the docarray-weaviate integration, weaviate has added support for nested objects, so they don't have to be stored as blobs anymore. However, they can't be filtered upon yet.

However, in @vincetrep example, the thing he wants to filter on is a list of texts, and this is already supported in weaviate's python client.

@JoanFM
Copy link
Member

JoanFM commented Jan 29, 2024

Hey @hsm207,

Is this change something easy to achieve?

@hsm207
Copy link
Collaborator

hsm207 commented Jan 29, 2024

hey @JoanFM

I don't think I'm in a position to give good estimates anymore. It's almost a year since I last worked on this and I'm not sure how the abstractions work anymore 🤣

maybe @JohannesMessner knows better?

@JoanFM JoanFM linked a pull request Feb 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants