Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSHARP-5442: Atlas Search doesn't respect the correct serializer #1583

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

papafe
Copy link
Contributor

@papafe papafe commented Dec 31, 2024

@papafe papafe requested a review from BorisDog December 31, 2024 15:29
@papafe
Copy link
Contributor Author

papafe commented Dec 31, 2024

@BorisDog I've added you as a reviewer since it seems you were the one that worked on search, mostly. This is still a work in progress, I've broke some things in the process.
A couple of notes:

  • Probably we need to change the in and range search definitions as well to use the correct serializers.
  • This is effectively a behaviour breaking change, so we need to decide if we want include it in next version or wait for next major


var renderedField = _arrayField.Render(args);

var serializer =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredibly ugly, but was just testing if it worked

Copy link
Contributor Author

@papafe papafe Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rstam this is what I was referring to.
Do you think it would be worth adding a new property to RenderArgs so that we can get directly the serializer we need (similar to RenderForElemMatch for instance)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on the right track but can use a little tweaking to how the right serializer is found and to reduce duplication between how scalar and array fields are handled.

Something like this:

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args)
{
    IBsonSerializer<TValue> valueSerializer;
    if (_field != null)
    {
        var renderedField = _field.Render(args);
        valueSerializer = renderedField.FieldSerializer;
    }
    else
    {
        var renderedArrayField = _arrayField.Render(args);
        valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);
    }                                                                                                                          
                                                                                                                               
    var document = new BsonDocument();                                                                                         
    using var bsonWriter = new BsonDocumentWriter(document);                                                                   
    var context = BsonSerializationContext.CreateRoot(bsonWriter);                                                             
    bsonWriter.WriteStartDocument();                                                                                           
    bsonWriter.WriteName("value");                                                                                             
    valueSerializer.Serialize(context, _value);                                                                                
    bsonWriter.WriteEndDocument();                                                                                             
                                                                                                                               
    return document;                                                                                                           
}                                                                                                                              

Copy link
Contributor Author

@papafe papafe Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, this solves my doubts. That makes sense, I was mostly focusing on how to get the correct serializer before trying to refactor.

Regarding IBsonArraySerializer, the majority of times when we implement that interface we also implement IChildSerializerConfigurable, and that makes sense. It also seems that for this use case the result we'd get would be the same, so what would you think about using IChildSerializerConfigurable instead?
To be honest this is more of a generic question, I'm curious about if there's any preference on using one over the other, of course considering that TryGetItemSerializationInfo can potentially return more info (not in this case though).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IBsonArraySerializer and IChildSerializerConfigurable are orthogonal. We can't just implement IChildSerializerConfigurable instead of IBsonArraySerializer.

In this case we must implement IBsonArraySerializer.

It is true that in general whenever implementing IBsonArraySerializer we would likely also want to implement IChildSerializerConfigurable. However, that applies mostly to public serializers that a user might want to configure. This is an internal serializer, so I think implementing IBsonArraySerializer is sufficient.

@papafe papafe requested a review from rstam January 2, 2025 16:36
src/MongoDB.Driver/Search/OperatorSearchDefinitions.cs Outdated Show resolved Hide resolved
public EqualsSearchDefinition(FieldDefinition<TDocument, TField> path, TField value, SearchScoreDefinition<TDocument> score)
: base(OperatorType.Equals, new SingleSearchPathDefinition<TDocument>(path), score)
{
ValidateType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question whether we need to have ValidateType at all. Looks to me like this was just a client-side limitation related to what .NET types ToBsonValue knew how to convert to BSON, but now that we are using the proper serializer this limitation should no longer exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Boris and he confirmed that it is a server limitation which types are supported.

But even so, we probably shouldn't be checking this client side. We should just send the value to the server and let it return an error if it doesn't support the data type.

That way our driver will work well with all supported versions of the server, which might have different limitations. And if future server versions support new value types we won't have to make any changes.

Copy link
Contributor Author

@papafe papafe Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like this idea. I tried to keep it consistent with what was there before, but I agree that moving this check server side would allow us to be more free in what we do and don't support over time.
I need to remember to do the same with the other operators too.


var renderedField = _arrayField.Render(args);

var serializer =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on the right track but can use a little tweaking to how the right serializer is found and to reduce duplication between how scalar and array fields are handled.

Something like this:

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args)
{
    IBsonSerializer<TValue> valueSerializer;
    if (_field != null)
    {
        var renderedField = _field.Render(args);
        valueSerializer = renderedField.FieldSerializer;
    }
    else
    {
        var renderedArrayField = _arrayField.Render(args);
        valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);
    }                                                                                                                          
                                                                                                                               
    var document = new BsonDocument();                                                                                         
    using var bsonWriter = new BsonDocumentWriter(document);                                                                   
    var context = BsonSerializationContext.CreateRoot(bsonWriter);                                                             
    bsonWriter.WriteStartDocument();                                                                                           
    bsonWriter.WriteName("value");                                                                                             
    valueSerializer.Serialize(context, _value);                                                                                
    bsonWriter.WriteEndDocument();                                                                                             
                                                                                                                               
    return document;                                                                                                           
}                                                                                                                              

@@ -317,6 +317,8 @@ internal class IEnumerableSerializer<TItem> : SerializerBase<IEnumerable<TItem>>
{
private readonly IBsonSerializer<TItem> _itemSerializer;

public IBsonSerializer<TItem> ItemSerializer => _itemSerializer;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this new public property have the class implement IBsonArraySerializer and implement the TryGetItemSerializationIinfo method:

public bool TryGetItemSerializationInfo(out BsonSerializationInfo serialization
{
    serializationInfo = new(elementName: null, _itemSerializer, typeof(TItem));
    return true;                                                               
}                                                                              

All serializers for array-like values should implement IBsonArraySerializer. I don't know why this class did not already implement it.

var renderedField = _arrayField.Render(args);

var serializer =
(renderedField.ValueSerializer as FieldValueSerializerHelper.IEnumerableSerializer<TField>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to cast to the very specific class FieldValueSerializerHelper.IEnumerableSerializer<TField> use the more general from (used everywhere else) to get the item serializer from an array-like serializer:

valueSerializer = (IBsonSerializer<TValue>)ArraySerializerHelper.GetItemSerializer(renderedArrayField.FieldSerializer);

This is why FieldValueSerializerHelper.IEnumerableSerializer<TItem> should implement IBsonArraySerializer (as all array-like serializers should). Not sure why it didn't already...

public EqualsSearchDefinition(FieldDefinition<TDocument, TField> path, TField value, SearchScoreDefinition<TDocument> score)
: base(OperatorType.Equals, new SingleSearchPathDefinition<TDocument>(path), score)
{
ValidateType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Boris and he confirmed that it is a server limitation which types are supported.

But even so, we probably shouldn't be checking this client side. We should just send the value to the server and let it return an error if it doesn't support the data type.

That way our driver will work well with all supported versions of the server, which might have different limitations. And if future server versions support new value types we won't have to make any changes.

@papafe papafe requested review from rstam and sanych-sun and removed request for BorisDog January 7, 2025 11:42
@papafe papafe marked this pull request as ready for review January 7, 2025 15:32
@papafe papafe requested a review from a team as a code owner January 7, 2025 15:32
src/MongoDB.Driver/Search/OperatorSearchDefinitions.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/Search/OperatorSearchDefinitions.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/FieldValueSerializerHelper.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/Search/OperatorSearchDefinitions.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/Search/OperatorSearchDefinitions.cs Outdated Show resolved Hide resolved
@papafe papafe requested a review from rstam January 8, 2025 10:24
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the Search API but as far as I can tell this looks OK.

You should get one more LGTM before merging to main though.

@papafe
Copy link
Contributor Author

papafe commented Jan 14, 2025

I'm not too familiar with the Search API but as far as I can tell this looks OK.

You should get one more LGTM before merging to main though.

@sanych-sun can you take a look too?

@papafe
Copy link
Contributor Author

papafe commented Jan 28, 2025

@BorisDog another thing to consider is that this is a behaviour breaking change, so we need to consider if we want to merge this now or wait. I think it could also be seen as a bug fix to be honest, but we should take a decision.
The difference is that now the correct registered serializers are used to serialize the values for the equals, in and range operators, while before default conversions were used, ignoring the registered serializers.

@papafe papafe requested a review from BorisDog January 28, 2025 19:52
src/MongoDB.Driver/FieldValueSerializerHelper.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/Search/SearchDefinition.cs Outdated Show resolved Hide resolved
value switch
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args)
{
var searchPathDefinition = (SingleSearchPathDefinition<TDocument>)_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit inefficient that we need to render the path twice.
Also it would be good to extract this to a single helper method.

What do you and @rstam think about having new internal Render method in SearchPathDefinition, something like

(BsonValue, RenderedFieldDefinition) Render(...)

And passing field serializer in OperatorSearchDefinition.RenderArguments() ?

Copy link
Contributor Author

@papafe papafe Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. I also think it's a little complicated to extract a method like that to SearchPathDefinition, since (BsonValue, RenderedFieldDefinition) Render(...) would make sense for a SingleSearchPathDefinition, but not for MultiSearchPathDefinition or WildcardSearchPathDefinition.
The issue here is that operators support various degrees of SearchPathDefinition. I'm trying to think of a more general solution that does not make breaking changes in the public API, but so far it all feels kinda clunky.

Copy link
Contributor

@BorisDog BorisDog Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I meant that the returned RenderedFieldDefinition would be null in some cases. I agree that it's not the greatest design, but I don't have any better suggestion as of now. I does have the upside of not rendering the path twice.
As long as all the API changes are internal, we can evolve them in the future if we come up with a better idea (the new suggested Render method should be internal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 60c1d57, I've added a fix to avoid rendering the path twice. It's not super nice, and I'm not sure it's worth it given that probably the network call of the atlas search would be much greater than the time needed to render the path twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we want to have special cases depending on the path we could also specialize OperatorSearchDefinition.RenderArguments and produce a different method for each path type, for example OperatorSearchDefinition.RenderArgumentsForSingleSearchPath where we can pass additional arguments. This would require quite some changes, but could also allow to control which kind of path can be used with certain kinds of operators.

@papafe
Copy link
Contributor Author

papafe commented Feb 6, 2025

@BorisDog I've done some corrections following your suggestions, and it looks much better than before, thank you 😄
I'm a little bit unsure about the names though, they feel a little redundant

@papafe papafe requested a review from BorisDog February 6, 2025 20:26

internal virtual (BsonValue, IBsonSerializer) RenderAndGetFieldSerializer(RenderArgs<TDocument> args) => (Render(args), null);

internal (string renderedPath, IBsonSerializer fieldSerializer) RenderFieldWithSerializer(FieldDefinition<TDocument> fieldDefinition, RenderArgs<TDocument> args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RenderFieldWithSerializer might be sound like "Render the field with provided serializer"
Would RenderFieldAndGetFieldSerializer be more accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense

AssertRendered(
subject.Equals("x", value),
$"{{ equals: {{ path: 'x', value: {valueRendered} }} }}");
//There is no property called "x" where to pick up a properly configured GuidSerializer for the tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand this exclusion. Is the default guid serializer different from the one used by serialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to improve the comment, I agree it's not super clear.
The issue here is that when we use a string path, we're going to get the default serializer for the value, so GuidSerializer. In v3.0 we decided to have GuidRepresentation.Unspecified as the default Guid representation, and this will throw an exception when trying to serialize/deserialize. Normally a user would go over this by registering a default GuidSerializer with the proper representation (BsonSerializer.Register...(GuidSerializer)), but we can't do it because the registered serializers are global and this would be a problem for all the other tests.

For the typed path on line 305 it's not an issue since we get the default field serializer from the class, and the Guid property has the BsonGuidRepresentation attribute on it.

This is more of a testing issue and should be solved when we can pass down the serialization domain 😁

@@ -313,29 +319,55 @@ public void Equals_should_render_supported_type<T>(
new object[] { (float)1, "1", Exp(p => p.Float), nameof(Person.Float) },
new object[] { (double)1, "1", Exp(p => p.Double), nameof(Person.Double) },
new object[] { DateTime.MinValue, "ISODate(\"0001-01-01T00:00:00Z\")", Exp(p => p.Birthday), "dob" },
new object[] { DateTimeOffset.MaxValue, "ISODate(\"9999-12-31T23:59:59.999Z\")", Exp(p => p.DateTimeOffset), nameof(Person.DateTimeOffset) },
new object[] { DateTimeOffset.MaxValue, """{ "DateTime" : { "$date" : "9999-12-31T23:59:59.999Z" }, "Ticks" : 3155378975999999999, "Offset" : 0 }""", Exp(p => p.DateTimeOffset), nameof(Person.DateTimeOffset) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that we are changing the defaults also for non typed queries (SearchDefinitionBuilder<BsonDocument>)?

If so, that might be considered a bug and a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will change also for non typed queries. They would need to convert the DateTime themselves if they want to keep the same behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a breaking change, but I'm not sure this can be considered a bug. If you're working directly with Bson documents then I assume that you would like to control how to the values are serialized in any case.

@papafe
Copy link
Contributor Author

papafe commented Feb 7, 2025

@BorisDog, as an additional note. I've tried making an untyped query (using BsonDocument) and passing a DateTime value for the Equal operator. No error returned but zero results as expected.

@rstam rstam self-requested a review February 10, 2025 22:19
@@ -42,7 +45,8 @@ public AutocompleteSearchDefinition(
_fuzzy = fuzzy;
}

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args) =>
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args,
IBsonSerializer fieldSerializer = null) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you can't put all arguments on a single line switch to a new line for each argument so that it's easy to see the arguments.

 private protected override BsonDocument RenderArguments(
     RenderArgs<TDocument> args,                         
     IBsonSerializer fieldSerializer)                    
 {                                                       

Same in all other similar methods in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also... I would recommend against making the new fieldSerializer argument optional. Better to force the callers to pass in null if necessary.

Otherwise it's just to easy to forget to pass in optional arguments, specially in existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation of new()

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args,
IBsonSerializer fieldSerializer = null)
{
var valueSerializer = fieldSerializer switch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there is some common code here that could be moved to the base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a little different between the various operators, so I'd prefer to keep it here

@@ -305,7 +304,8 @@ public NearSearchDefinition(
_pivot = pivot;
}

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args) =>
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args,
IBsonSerializer fieldSerializer = null) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation of new()

@@ -483,7 +488,8 @@ public WildcardSearchDefinition(
_allowAnalyzedField = allowAnalyzedField;
}

private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args) =>
private protected override BsonDocument RenderArguments(RenderArgs<TDocument> args,
IBsonSerializer fieldSerializer = null) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation of new()

return new(_operatorType.ToCamelCase(), renderedArgs);
}

private protected virtual BsonDocument RenderArguments(RenderArgs<TDocument> args) => new();
private protected virtual BsonDocument RenderArguments(RenderArgs<TDocument> args,
IBsonSerializer fieldSerializer = null) => new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you can't put all arguments on a single line switch to a new line for each argument so that it's easy to see the arguments:

private protected virtual BsonDocument RenderArguments(
    RenderArgs<TDocument> args,                        
    IBsonSerializer fieldSerializer)                   

@@ -104,11 +105,17 @@ public static implicit operator SearchPathDefinition<TDocument>(List<string> fie
/// <param name="args">The render arguments.</param>
/// <returns>The rendered field.</returns>
protected string RenderField(FieldDefinition<TDocument> fieldDefinition, RenderArgs<TDocument> args)
=> RenderFieldAndGetFieldSerializer(fieldDefinition, args).renderedPath;

internal virtual (BsonValue, IBsonSerializer) RenderAndGetFieldSerializer(RenderArgs<TDocument> args) => (Render(args), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you can't put all arguments on a single line switch to a new line for each argument so that it's easy to see the arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend that instead of returning a Tuple we use an out parameter:

internal virtual BsonValue Render(RenderArgs<TDocument> args, out IBsonSerializer fieldSerializer)

I think it makes more sense, because the main point is to Render the args and getting the fieldSerializer is secondary.

@@ -144,6 +145,9 @@ public SingleSearchPathDefinition(FieldDefinition<TDocument> field)

public override BsonValue Render(RenderArgs<TDocument> args) =>
RenderField(_field, args);

internal override (BsonValue, IBsonSerializer) RenderAndGetFieldSerializer(RenderArgs<TDocument> args)
=> RenderFieldAndGetFieldSerializer(_field, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you accept my other suggestions this would become:

internal override BsonValue Render(RenderArgs<TDocument> args, out IBsonSerializer fieldSerializer)
    => RenderField(_field, args, out fieldSerializer);                                             


internal virtual (BsonValue, IBsonSerializer) RenderAndGetFieldSerializer(RenderArgs<TDocument> args) => (Render(args), null);

internal (string renderedPath, IBsonSerializer fieldSerializer) RenderFieldAndGetFieldSerializer(FieldDefinition<TDocument> fieldDefinition, RenderArgs<TDocument> args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider declaring and implementing like this instead:

internal string RenderField(
    FieldDefinition<TDocument> fieldDefinition,
    RenderArgs<TDocument> args,
    out IBsonSerializer fieldSerializer)
{
    var renderedField = fieldDefinition.Render(args);
    var prefix = args.PathRenderArgs.PathPrefix;
                                                
    fieldSerializer = renderedField.FieldSerializer;
    return prefix == null ? renderedField.FieldName : $"{prefix}.{renderedField.FieldName}";
}                                                                                           

@papafe
Copy link
Contributor Author

papafe commented Feb 11, 2025

@rstam @BorisDog I've followed Robert's suggestions about the code style.
I've also come up with an extension method to enable the new behaviour, so this won't be a breaking change. It's all in the last commit, so it can be reverted if necessary

@papafe papafe requested review from BorisDog and rstam February 11, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants