-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Issue with Data Structure in Chroma DB Collections #2170
Comments
Agreed. Or if not, a couple of good concrete examples of doing this? |
I think this is a comment on a row-based vs column-based return format. The main reason that chroma exists in this way is because at ingest time, most users have a columnar data structure since thats how the embeddings are generated. Rather than munge that into a row format the thought was it would be nice if that could be dumped directly into chroma. We felt it was a bit odd to accept columnar inputs but return row based outputs. I think this has been raised a couple of times We are open to ideas here ! Just think its important we are consistent |
Describe the problem
I've noticed an issue with the way collections are structured in Chroma DB that makes data retrieval less efficient and more complex than it needs to be. When I retrieve a collection, I expect a collection of entities, but instead, I get many collections of entity components.
Here's an example of how I currently have to retrieve ids and some metadata from a collection:
This approach is not ideal from a syntactic point of view, and possibly from a performance perspective as well, because to project some features of an item, I need to retrieve the whole collection, then grab some items according to the ordinal position.
Conceptually, it feels like going to a car dealership to choose a car, but instead of seeing complete cars, you’re shown all the doors in one place and all the wheels in another. In the end, you can’t mix and match parts—you still have to choose items that belong to the same car.
I’m aware that it’s possible to decide whether to include embeddings or filter against features, but this doesn’t fully address the issue. I believe a more intuitive and efficient approach would be to structure collections as collections of entities, rather than collections of entity components.
Has anyone else experienced this issue, or can anyone provide insight into why the data structure is designed this way?
Describe the proposed solution
Seems like a proposal has been made:
https://github.com/amikos-tech/chroma-go/blob/main/types/record.go
The solution should be as simple as a standard dictionary retrieval pattern should be:
Alternatives considered
No response
Importance
i cannot use Chroma without it
Additional Information
No response
The text was updated successfully, but these errors were encountered: