Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal lowering for list patterns that can be handled by .StartsWith/.EndsWith #77908

Open
neon-sunset opened this issue Mar 29, 2025 · 3 comments
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Milestone

Comments

@neon-sunset
Copy link

neon-sunset commented Mar 29, 2025

Given

static bool Test(ReadOnlySpan<byte> data)
{
    if (data.Length < 8)
        return false;

    return data
        is [0x01, 0x02, 0x03, 0x04, ..]
        or [0x05, 0x06, 0x07, 0x08, ..]
        or [0x09, 0x0A, 0x0B, 0x0C, ..]
        or [0x0D, 0x0E, 0x0F, 0x10, ..]
        or [0x11, 0x12, 0x13, 0x14, ..]
        or [0x15, 0x16, 0x17, 0x18, ..]
        or [0x19, 0x1A, 0x1B, 0x1C, ..];
}

it appears that Roslyn compiles such patterns to

static bool Test(ReadOnlySpan<byte> data)
{
    if (data.Length < 8)
    {
        return false;
    }
    if (data.Length >= 4)
    {
        byte b = data[0];
        if ((uint)b <= 9u)
        {
            if (b != 1)
            {
                if (b != 5)
                {
                    if (b == 9 && data[1] == 10 && data[2] == 11 && data[3] == 12)
                    {
                        goto IL_01ad;
                    }
                }
                else if (data[1] == 6 && data[2] == 7 && data[3] == 8)
                {
                    goto IL_01ad;
                }
            }
            else if (data[1] == 2 && data[2] == 3 && data[3] == 4)
            {
                goto IL_01ad;
            }
        }
        else if ((uint)b <= 17u)
        {
            if (b != 13)
            {
                if (b == 17 && data[1] == 18 && data[2] == 19 && data[3] == 20)
                {
                    goto IL_01ad;
                }
            }
            else if (data[1] == 14 && data[2] == 15 && data[3] == 16)
            {
                goto IL_01ad;
            }
        }
        else if (b != 21)
        {
            if (b == 25 && data[1] == 26 && data[2] == 27 && data[3] == 28)
            {
                goto IL_01ad;
            }
        }
        else if (data[1] == 22 && data[2] == 23 && data[3] == 24)
        {
            goto IL_01ad;
        }
    }
    return false;
    IL_01ad:
    return true;
}

This is not efficient as it results in a multitude of byte-by-byte comparisons which are not required here (and no, JIT will not optimize these away because it is a complex set of conditions).

It would be great if Roslyn could emit the following instead:

...
return
    data.StartsWith((ReadOnlySpan<byte>)[0x01, 0x02, 0x03, 0x04]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x05, 0x06, 0x07, 0x08]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x09, 0x0A, 0x0B, 0x0C]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x0D, 0x0E, 0x0F, 0x10]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x11, 0x12, 0x13, 0x14]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x15, 0x16, 0x17, 0x18]) ||
    data.StartsWith((ReadOnlySpan<byte>)[0x19, 0x1A, 0x1B, 0x1C]);

which results in a much more efficient codegen (and could be a subject to further and better JIT optimization).

They key defining factor is that primitive comparisons up to a machine word size would be more efficient if simply merged together (or even beyond that, to an extent), which is what happens when .StartsWith is called - it's just a single uint comparison (even if the codegen is not perfect currently). It is massively cheaper than doing byte-by-byte reads.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 29, 2025
@CyrusNajmabadi
Copy link
Member

I don't know why the jit would not be able to optimize this.

@RikkiGibson
Copy link
Contributor

A microbenchmark would help us understand the impact here. It would be useful also to compare with cases where each of the list patterns had the same element patterns except one e.g. [1,2,3,4] or [1,2,3,5].

@neon-sunset
Copy link
Author

neon-sunset commented Mar 29, 2025

Given sample code:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

static class Example
{
    static bool Test1(ReadOnlySpan<byte> data)
    {
        if (data.Length < 8)
            return false;

        return data
            is [0x01, 0x02, 0x03, 0x04, ..]
            or [0x05, 0x06, 0x07, 0x08, ..]
            or [0x09, 0x0A, 0x0B, 0x0C, ..]
            or [0x0D, 0x0E, 0x0F, 0x10, ..]
            or [0x11, 0x12, 0x13, 0x14, ..]
            or [0x15, 0x16, 0x17, 0x18, ..]
            or [0x19, 0x1A, 0x1B, 0x1C, ..];
    }

    static bool Test2(ReadOnlySpan<byte> data)
    {
        if (data.Length < 8)
            return false;

        return
            data.StartsWith((ReadOnlySpan<byte>)[0x01, 0x02, 0x03, 0x04]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x05, 0x06, 0x07, 0x08]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x09, 0x0A, 0x0B, 0x0C]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x0D, 0x0E, 0x0F, 0x10]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x11, 0x12, 0x13, 0x14]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x15, 0x16, 0x17, 0x18]) ||
            data.StartsWith((ReadOnlySpan<byte>)[0x19, 0x1A, 0x1B, 0x1C]);
    }

    static bool Test3(ReadOnlySpan<byte> data)
    {
        if (data.Length < 8)
            return false;

        return
            Match(data, 0x01, 0x02, 0x03, 0x04) ||
            Match(data, 0x05, 0x06, 0x07, 0x08) ||
            Match(data, 0x09, 0x0A, 0x0B, 0x0C) ||
            Match(data, 0x0D, 0x0E, 0x0F, 0x10) ||
            Match(data, 0x11, 0x12, 0x13, 0x14) ||
            Match(data, 0x15, 0x16, 0x17, 0x18) ||
            Match(data, 0x19, 0x1A, 0x1B, 0x1C);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    static bool Match(ReadOnlySpan<byte> bytes, byte a, byte b, byte c, byte d) =>
        MemoryMarshal.Read<uint>(bytes) == (a | (b << 8) | (c << 16) | (d << 24));
}

Godbolt's disasm: https://godbolt.org/z/3T9naoevs

The cost of evaluating early "not match" condition in the original form should be marginally cheaper than the "improved" variant Test2. This difference can be further addressed in the JIT to help it avoid emitting unnecessary sete; movzx; test sequences. Then, Test2 will be able to match Test3 in code, which I expect to be the optimal variant for the given pattern, save for doing bitmap tests(?). But I'd expect turning the output of Test1 into Test3 will be substantially more difficult.

The simulated costs on Skylake (with https://uica.uops.info) are:
Test1 - 43c
Test3 - 17c
(Test2 needs to be fixed by hand and I'm in the middle of another task)

I understand that this requires Roslyn to make certain assumptions about what the runtime accepting CIL can or cannot do, but it would be nice to just write idiomatic patterns and know that the lowering strategy will pick a close to optimal way to handle them.

In the disasm, there are comparison sequences which I think could be merged by the JIT and will submit a separate issue at dotnet/runtime.

Thanks.

@jaredpar jaredpar added the Code Gen Quality Room for improvement in the quality of the compiler's generated code label Apr 2, 2025
@jaredpar jaredpar added this to the Backlog milestone Apr 2, 2025
@dotnet-policy-service dotnet-policy-service bot removed the untriaged Issues and PRs which have not yet been triaged by a lead label Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Projects
None yet
Development

No branches or pull requests

4 participants