Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UTF8 string literals #2940

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/mutations.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,9 @@ Do you have a suggestion for a (new) mutator? Feel free to create an [issue](htt
| Original | Mutated |
| ------------- | ------------- |
| `"foo"` | `""` |
| `"foo"u8` | `""u8` |
| `""` | `"Stryker was here!"` |
| `""u8` | `"Stryker was here!"u8` |
| `$"foo {bar}"` | `$""` |
| `@"foo"` | `@""` |
| `string.Empty` | `"Stryker was here!"` |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

<PropertyGroup>
<TargetFramework>net6.0</TargetFramework>
<LangVersion>latest</LangVersion>
</PropertyGroup>

<PropertyGroup>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
using ExampleProject.String;
using Xunit;

namespace ExampleProject.XUnit.String
{
public class Utf8StringMagicTests
{
[Fact]
richardwerkman marked this conversation as resolved.
Show resolved Hide resolved
public void AddTwoStrings()
{
var sut = new Utf8StringMagic();
var actual = sut.HelloWorld();
Assert.Equal("Hello World!"u8, actual);
}

[Fact]
public void IsNullOrEmpty()
{
var sut = new Utf8StringMagic();
var actual = sut.IsNullOrEmpty("hello"u8);
Assert.False(actual);
}

[Fact]
public void IsNullOrEmpty_Empty()
{
var sut = new Utf8StringMagic();
var actual = sut.IsNullOrEmpty(""u8);
Assert.True(actual);
}

[Fact]
public void Referenced()
{
var sut = new Utf8StringMagic();
var input = "hello"u8;
sut.Referenced(out input);
Assert.Equal("world"u8, input);
}

[Fact]
public void ReferencedEmpty()
{
var sut = new Utf8StringMagic();
var input = "hello"u8;
sut.ReferencedEmpty(out input);
Assert.Equal(""u8, input);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
using System;
using System.Linq;

namespace ExampleProject.String
{
public class Utf8StringMagic
{
public ReadOnlySpan<byte> HelloWorld()
{
return "Hello"u8 + " "u8 + "World!"u8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run the integration test locally and saw this line isn't mutated correctly. The mutation causes compile errors. I think this isn't related to the mutator but related to the mutation placing logic. I'd say we create a separate defect for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed in this PR, not as a separate defect.

Copy link
Author

@iamdmitrij iamdmitrij Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated why this test is failing. It fails during mutated code compilation. Stryker by default add ternary expression to each operand:

So this statement

return "Hello"u8 + " "u8 + "World!"u8;

is transformed into this:

return (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(99)?""u8 :"Hello"u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(100)?""u8 :" "u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(101)?""u8:"World!"u8);

It boxes each UTF-8 value to ReadOnlySpan<byte>() type. The problem lies with C# and how it allows to concatenate UTF-8 string literals. It allows to concatenate UTF-8 string directly: "Hello"u8 + " "u8 + "World!"u8, but not when they are represented as ReadOnlySpan<byte>: new ReadOnlySpan<byte>() + new ReadOnlySpan<byte>().

Hence, mutated code fails to compile with:

CS9047	Operator '+' cannot be applied to operands of type 'ReadOnlySpan<byte>' and 'ReadOnlySpan<byte>' that are not UTF-8 byte representations

Do you think it's worth fixing this bug in current PR? It also touches core MutantControl injection logic. If so, any new ideas are welcome here. So far, I've only thought of converting UTF-8 strings into arrays and concatenating them manually using LINQ when ternary conditions are used:

public ReadOnlySpan<byte> HelloWorld()
{
    return (true ? ""u8 : "Hello"u8).ToArray()
        .Concat((true ? ""u8 : " "u8).ToArray()
            .Concat((true ? ""u8 : "World!"u8).ToArray()))
        .ToArray();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. In that case, we could maybe include some extra code in the compilation that contains an operator overload for + on ReadOnlySpan<byte>. During the compilation we already pass some custom code to be compiled together with the source project, so that should be fairly easy to do.

Do you think it's worth fixing this bug in current PR?

Yes, as this seems like an easy fix we should add to this PR. Otherwise, we risk that this will never be fixed once this PR has been merged.

Copy link
Member

@dupdob dupdob Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure adding this operator is needed.

  1. "Hello"u8 + " "u8 + "World!"u8; is actually a compile time constant so the compiler concatenate the strings and does not compile any expression. See the actual result:
    internal static readonly __StaticArrayInitTypeSize=13 5A09E8FA9C77807B24E99C9CF999DEBFAD8441E269EB960E201F61FC3DE20D5A/* Not supported: data(48 65 6C 6C 6F 20 57 6F 72 6C 64 21 00) */;
    An array of 13 bytes. This is seen as a single string here. As such, it does not really make sense to mutate sub parts.
  2. In any case, there is no interest in mutating this expression more than once. It is almost certain that if removing Hello is killed by a failed test, that very same test will kill removing the space or world!. So there is no benefit keeping each mutation. Note that this remarks is also valid for classical constant string concatenations.
  3. As such, it would make more sense to mutate the whole expression once, which would remove any problems with the missing operator.
  4. Adding this operator requires injecting a dedicated using statement at the start of any files needing it; meaning it would have to be added everywhere. Furthermore, it could result in compilation errors (ambiguous call) that would require a new rollback logic (to detect it and remove the unnecessary using statement). It does not appear easy to me.
  5. The only consequence of not adding this operand is simply that concatenated u8 strings will not be mutated; which looks alright to me.

TLDR;
I recommend: doing nothing for now and contemplate detecting concatenated constant strings (u8 or otherwise) to be mutated at once. But it requires specific logic within the ExpressionOrchestrator

}

public void Referenced(out ReadOnlySpan<byte> test)
{
test = "world"u8;
}

public void ReferencedEmpty(out ReadOnlySpan<byte> test)
{
test = ""u8;
}

public bool IsNullOrEmpty(ReadOnlySpan<byte> myString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with utf8 strings and can be removed

{
if (myString.IsEmpty)
{
return true;
}
return false;
}
}
}
12 changes: 6 additions & 6 deletions integrationtest/ValidationProject/ValidateStrykerResults.cs
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ public void NetCore()

var report = JsonConvert.DeserializeObject<JsonReport>(strykerRunOutput);

CheckReportMutants(report, total: 114, ignored: 55, survived: 4, killed: 6, timeout: 2, nocoverage: 45);
CheckReportTestCounts(report, total: 14);
CheckReportMutants(report, total: 129, ignored: 59, survived: 5, killed: 10, timeout: 2, nocoverage: 45);
CheckReportTestCounts(report, total: 19);
}

[Fact]
Expand Down Expand Up @@ -121,8 +121,8 @@ public void NetCoreWithTwoTestProjects()

var report = JsonConvert.DeserializeObject<JsonReport>(strykerRunOutput);

CheckReportMutants(report, total: 114, ignored: 27, survived: 8, killed: 8, timeout: 2, nocoverage: 67);
CheckReportTestCounts(report, total: 30);
CheckReportMutants(report, total: 129, ignored: 31, survived: 9, killed: 12, timeout: 2, nocoverage: 67);
CheckReportTestCounts(report, total: 35);
}

[Fact]
Expand All @@ -140,8 +140,8 @@ public void SolutionRun()

var report = JsonConvert.DeserializeObject<JsonReport>(strykerRunOutput);

CheckReportMutants(report, total: 114, ignored: 55, survived: 4, killed: 6, timeout: 2, nocoverage: 45);
CheckReportTestCounts(report, total: 30);
CheckReportMutants(report, total: 129, ignored: 59, survived: 5, killed: 10, timeout: 2, nocoverage: 45);
CheckReportTestCounts(report, total: 35);
}

private void CheckMutationKindsValidity(JsonReport report)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,29 @@ public void ShouldBeMutationLevelStandard()
target.MutationLevel.ShouldBe(MutationLevel.Standard);
}

[Theory]
[InlineData("", """
"Stryker was here!"u8
""")]
[InlineData("foo", """
""u8
""")]
iamdmitrij marked this conversation as resolved.
Show resolved Hide resolved
public void ShouldMutateUtf8StringLiteral(string original, string expected)
{
var syntaxTree = CSharpSyntaxTree.ParseText($"""var test = "{original}"u8;""");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the u8 token here? I'd put it in the inline data to keep it transparent what is being mutated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good idea. Will add to inline data.

iamdmitrij marked this conversation as resolved.
Show resolved Hide resolved

var literalExpression = syntaxTree.GetRoot().DescendantNodes().OfType<LiteralExpressionSyntax>().First();
var mutator = new StringMutator();

var result = mutator.ApplyMutations(literalExpression, null).ToList();

var mutation = result.ShouldHaveSingleItem();

mutation.ReplacementNode.ShouldBeOfType<LiteralExpressionSyntax>()
.Token.Text.ShouldBe(expected);
mutation.DisplayName.ShouldBe("String mutation");
}

[Theory]
[InlineData("", "Stryker was here!")]
[InlineData("foo", "")]
Expand Down
66 changes: 55 additions & 11 deletions src/Stryker.Core/Stryker.Core/Mutators/StringMutator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,53 @@ public override IEnumerable<Mutation> ApplyMutations(LiteralExpressionSyntax nod
// Get objectCreationSyntax to check if it contains a regex type.
var root = node.Parent?.Parent?.Parent;

if (!IsSpecialType(root) && IsStringLiteral(node))
if (IsSpecialType(root))
{
var currentValue = (string)node.Token.Value;
var replacementValue = currentValue == "" ? "Stryker was here!" : "";
yield return new Mutation
{
OriginalNode = node,
ReplacementNode = SyntaxFactory.LiteralExpression(SyntaxKind.StringLiteralExpression, SyntaxFactory.Literal(replacementValue)),
DisplayName = "String mutation",
Type = Mutator.String
};
yield break;
}

SyntaxNode syntaxNode;
string currentValue;
string replacementValue;

if (IsStringLiteral(node))
{
currentValue = (string)node.Token.Value;
replacementValue = currentValue == "" ? "Stryker was here!" : "";
syntaxNode = SyntaxFactory.LiteralExpression(SyntaxKind.StringLiteralExpression,
SyntaxFactory.Literal(replacementValue));
}
else if(IsUtf8StringLiteral(node))
iamdmitrij marked this conversation as resolved.
Show resolved Hide resolved
{
currentValue = (string)node.Token.Value;
replacementValue = currentValue == "" ? "Stryker was here!" : "";
syntaxNode = CreateUtf88String(node.GetLeadingTrivia(), replacementValue, node.GetTrailingTrivia());
}
else
{
yield break;
}

yield return new Mutation
{
OriginalNode = node,
ReplacementNode = syntaxNode,
DisplayName = "String mutation",
Type = Mutator.String
};
}

private static bool IsUtf8StringLiteral(LiteralExpressionSyntax node)
{
var kind = node.Kind();
return kind is SyntaxKind.Utf8StringLiteralExpression
&& node.Parent is not ConstantPatternSyntax;
}

private static bool IsStringLiteral(LiteralExpressionSyntax node)
{
var kind = node.Kind();
return kind == SyntaxKind.StringLiteralExpression
return kind is SyntaxKind.StringLiteralExpression
&& node.Parent is not ConstantPatternSyntax;
}

Expand All @@ -49,4 +78,19 @@ private static bool IsCtorOfType(ObjectCreationExpressionSyntax ctor, Type type)
var ctorType = ctor.Type.ToString();
return ctorType == type.Name || ctorType == type.FullName;
}

private static LiteralExpressionSyntax CreateUtf88String(SyntaxTriviaList leadingTrivia, string stringValue, SyntaxTriviaList trailingTrivia)
{
var quoteCharacter = '"';
var suffix = "u8";

var literal = SyntaxFactory.Token(
leading: leadingTrivia,
kind: SyntaxKind.Utf8StringLiteralToken,
text: quoteCharacter + stringValue + quoteCharacter + suffix,
valueText: "",
trailing: trailingTrivia);

return SyntaxFactory.LiteralExpression(SyntaxKind.Utf8StringLiteralExpression, literal);
}
}
Loading