Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Link DBRefs do not preserve extra attributes when calling document.model_dump() #1041

Open
MrEarle opened this issue Oct 7, 2024 · 5 comments

Comments

@MrEarle
Copy link
Contributor

MrEarle commented Oct 7, 2024

Describe the bug
When dumping a model with links, the corresponding DBRef dicts don't keep extra attributes.

To Reproduce

class User(Document):
    email: str


class ToDo(Document):
    user: Link[User]


async def main():
    client = AsyncIOMotorClient(db_uri)
    await init_beanie(
        database=client.get_database("test"),
        document_models=[User, ToDo],
    )

    user = await User(email="[email protected]").create()

    user_db_ref = DBRef("user", user.id, _extra={"email": user.email})

    todo = await ToDo(user=user_db_ref).create()

    print(todo.model_dump()["user"])  # Does not print the email field

Expected behavior
A clear and concise description of what you expected to happen.

I would expect that the serialization of the DBRef contains the extra fields it contains. This helps in having an extended reference pattern.

Additional context

Playing around with beanie's implementation, I got it working by modifying the to_dict method of Link:

    def to_dict(self):
        extra = {k: v for k,v in self.ref.as_doc().items() if k not in {"$ref", "$id", "$db"}}
        return {**extra, "id": str(self.ref.id), "collection": self.ref.collection}

Edit: This would probably change how to do serialization with Pydantic. For example, in build_validation for Pydantic V2, if isinstance(v, dict) and v.keys() == {"id", "collection"} is used to detect if the dict is a DBRef. Given that keys can now contain more items, the strict equality would no longer hold.

@staticxterm
Copy link

I would technically not characterize this as a bug as this was simply never implemented. My guess is that DBRef comes from PyMongo right?
https://github.com/mongodb/mongo-python-driver/blob/master/bson/dbref.py
The thing is, Link objects are essentially Pydantic models. And when you try to serialize attributes that are private in the model, they are simply omitted
Ref: https://docs.pydantic.dev/latest/concepts/models/#private-model-attributes
So to support your use case, some 'hacking' would be necessary, as you've demonstrated.

@MrEarle
Copy link
Contributor Author

MrEarle commented Oct 7, 2024

Well, Links are not pydantic models, they are classes that can be serialized by pydantic. The serialization very explicitly only encodes the inner self.ref's id and collection, while ignoring any other attribute it has.

Also, when creating a document with a link, you need to either call to_ref or build the DBRef yourself. Maybe this is not a bug, but I don't think it is desirable behavior to omit the fields that the DBRef contains.

@MrEarle
Copy link
Contributor Author

MrEarle commented Oct 8, 2024

To further drive de discussion, Beanie showcases Links as a way to filter a document based on the linked document field. In my example, it would be:

Todo.find(Todo.user.email == email, fetch_links=True)

But this performs a lookup before filtering, and cannot take advantages of indices, which requires expensive colscan operations. This get harder for more complex real-world examples.

A way to improve on this, is to have extra fields, like in the embedded document pattern, that can be indexed (IndexModel(user.email)). That way, we can perform the above query with an index and without the need to fetch links. We can then fetch the links after performing the query, which would not involve a colscan just to filter the Todo collection.

But the current implementation does not support this.

@staticxterm
Copy link

I see. That may be very useful but would probably complicate the design or overall logic.
If I'm understanding this correctly, we can have any additional fields under the $ref field, which then somehow need to stay in sync from wherever they are originally defined, is this the expectation? (I'm thinking about when this embedded "Link" data needs to be updated) If so, this seems like much work.

But I do agree that at least this possibility to add extra fields is currently missing, and your proposed code in Additional context seems like a way to go and probably wouldn't break anything if extra fields are added.

@MrEarle
Copy link
Contributor Author

MrEarle commented Oct 8, 2024

Well, it should be the responsibility of the developer to determine when to update the extra fields in the DBRef, as that is part of the business logic of their use cases. The advantages of embedded document pattern are to improve read performance, at the expense of write performance, which is usually rarer. This is a common (and recommended) pattern for MongoDB.

Maybe the solution is not to add this functionality to Links, but have a new type Link-ish feature that represents an embedded document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants