Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement LINQ ToList using SegmentedArrayBuilder to reduce allocations #104365

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andrewjsaid
Copy link
Contributor

After #96570, ToArray() in many places is faster and has fewer allocations that ToList(). As a List is just a wrapper over an array, we can re-use most of the logic in SegmentedArrayBuilder to build Lists, too.

There were some more complex cases using SegmentedArrayBuilder for example Concat where I left the code as-is. I can also experiment with performance numbers for those if it is requested.

Benchmarks

Int32Tests

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3810/23H2/2023Update/SunValley3)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.302
  [Host]     : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2
  Job-YPZLVD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-OASBAG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

LaunchCount=3  

Method Count Mean Ratio Allocated Alloc Ratio
Enumerable_ToList main 1 25.87 ns 1.00 112 B 1.00
Enumerable_ToList pr 1 28.49 ns 1.10 104 B 0.93
IEnumerableSelectIterator_ToList main 1 46.05 ns 1.00 168 B 1.00
IEnumerableSelectIterator_ToList pr 1 48.76 ns 1.06 160 B 0.95
IteratorSelectIterator_ToList main 1 55.84 ns 1.00 184 B 1.00
IteratorSelectIterator_ToList pr 1 60.04 ns 1.08 184 B 1.00
OfTypeIterator_ToList main 1 61.88 ns 1.00 200 B 1.00
OfTypeIterator_ToList pr 1 65.08 ns 1.05 176 B 0.88
SelectManySingleSelectorIterator_ToList main 1 62.16 ns 1.00 240 B 1.00
SelectManySingleSelectorIterator_ToList pr 1 67.54 ns 1.08 232 B 0.97
IEnumerableSkipTakeIterator_ToLost main 1 41.68 ns 1.00 128 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 1 43.83 ns 1.05 128 B 1.00
IEnumerableWhereIterator_ToList main 1 42.11 ns 1.00 168 B 1.00
IEnumerableWhereIterator_ToList pr 1 42.63 ns 1.01 160 B 0.95
ArrayWhereIterator_ToList main 1 38.62 ns 1.00 152 B 1.00
ArrayWhereIterator_ToList pr 1 45.68 ns 1.18 144 B 0.95
IEnumerableWhereSelectIterator_ToList main 1 60.61 ns 1.00 232 B 1.00
IEnumerableWhereSelectIterator_ToList pr 1 65.19 ns 1.10 224 B 0.97
ArrayWhereSelectIterator_ToList main 1 55.79 ns 1.00 208 B 1.00
ArrayWhereSelectIterator_ToList pr 1 60.01 ns 1.07 200 B 0.96
Enumerable_ToList main 5 44.33 ns 1.00 168 B 1.00
Enumerable_ToList pr 5 32.99 ns 0.74 120 B 0.71
IEnumerableSelectIterator_ToList main 5 65.96 ns 1.00 224 B 1.00
IEnumerableSelectIterator_ToList pr 5 56.28 ns 0.85 176 B 0.79
IteratorSelectIterator_ToList main 5 73.91 ns 1.00 224 B 1.00
IteratorSelectIterator_ToList pr 5 77.94 ns 1.05 224 B 1.00
OfTypeIterator_ToList main 5 132.67 ns 1.00 384 B 1.00
OfTypeIterator_ToList pr 5 115.63 ns 0.87 304 B 0.79
SelectManySingleSelectorIterator_ToList main 5 122.24 ns 1.00 296 B 1.00
SelectManySingleSelectorIterator_ToList pr 5 106.91 ns 0.88 248 B 0.84
IEnumerableSkipTakeIterator_ToLost main 5 56.19 ns 1.00 168 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 5 58.61 ns 1.04 168 B 1.00
IEnumerableWhereIterator_ToList main 5 62.12 ns 1.00 224 B 1.00
IEnumerableWhereIterator_ToList pr 5 50.66 ns 0.81 176 B 0.79
ArrayWhereIterator_ToList main 5 56.66 ns 1.00 224 B 1.00
ArrayWhereIterator_ToList pr 5 49.61 ns 0.88 176 B 0.79
IEnumerableWhereSelectIterator_ToList main 5 82.31 ns 1.00 288 B 1.00
IEnumerableWhereSelectIterator_ToList pr 5 72.20 ns 0.88 240 B 0.83
ArrayWhereSelectIterator_ToList main 5 74.32 ns 1.00 280 B 1.00
ArrayWhereSelectIterator_ToList pr 5 62.94 ns 0.85 232 B 0.83
Enumerable_ToList main 50 167.20 ns 1.00 688 B 1.00
Enumerable_ToList pr 50 118.14 ns 0.71 296 B 0.43
IEnumerableSelectIterator_ToList main 50 191.60 ns 1.00 744 B 1.00
IEnumerableSelectIterator_ToList pr 50 171.93 ns 0.90 352 B 0.47
IteratorSelectIterator_ToList main 50 251.54 ns 1.00 800 B 1.00
IteratorSelectIterator_ToList pr 50 231.18 ns 0.92 408 B 0.51
OfTypeIterator_ToList main 50 731.40 ns 1.00 2432 B 1.00
OfTypeIterator_ToList pr 50 734.97 ns 1.01 1744 B 0.72
SelectManySingleSelectorIterator_ToList main 50 692.19 ns 1.00 816 B 1.00
SelectManySingleSelectorIterator_ToList pr 50 656.03 ns 0.95 424 B 0.52
IEnumerableSkipTakeIterator_ToLost main 50 186.72 ns 1.00 744 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 50 161.33 ns 0.86 352 B 0.47
IEnumerableWhereIterator_ToList main 50 187.08 ns 1.00 744 B 1.00
IEnumerableWhereIterator_ToList pr 50 164.58 ns 0.88 352 B 0.47
ArrayWhereIterator_ToList main 50 154.86 ns 1.00 920 B 1.00
ArrayWhereIterator_ToList pr 50 141.83 ns 0.92 528 B 0.57
IEnumerableWhereSelectIterator_ToList main 50 209.25 ns 1.00 808 B 1.00
IEnumerableWhereSelectIterator_ToList pr 50 186.49 ns 0.89 416 B 0.51
ArrayWhereSelectIterator_ToList main 50 171.83 ns 1.00 976 B 1.00
ArrayWhereSelectIterator_ToList pr 50 145.71 ns 0.85 584 B 0.60
ObjectTests

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3810/23H2/2023Update/SunValley3)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.302
  [Host]     : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2
  Job-YPZLVD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-OASBAG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

LaunchCount=3  

Method Count Mean Ratio Allocated Alloc Ratio
Enumerable_ToList main 1 41.15 ns 1.00 136 B 1.00
Enumerable_ToList pr 1 45.14 ns 1.10 112 B 0.82
IEnumerableSelectIterator_ToList main 1 68.84 ns 1.00 192 B 1.00
IEnumerableSelectIterator_ToList pr 1 68.88 ns 1.01 168 B 0.88
IteratorSelectIterator_ToList main 1 75.90 ns 1.00 192 B 1.00
IteratorSelectIterator_ToList pr 1 84.56 ns 1.11 192 B 1.00
OfTypeIterator_ToList main 1 57.58 ns 1.00 184 B 1.00
OfTypeIterator_ToList pr 1 61.52 ns 1.07 160 B 0.87
SelectManySingleSelectorIterator_ToList main 1 88.42 ns 1.00 264 B 1.00
SelectManySingleSelectorIterator_ToList pr 1 89.23 ns 1.00 240 B 0.91
IEnumerableSkipTakeIterator_ToLost main 1 55.73 ns 1.00 136 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 1 60.91 ns 1.09 136 B 1.00
IEnumerableWhereIterator_ToList main 1 66.58 ns 1.00 192 B 1.00
IEnumerableWhereIterator_ToList pr 1 70.11 ns 1.05 168 B 0.88
ArrayWhereIterator_ToList main 1 48.82 ns 1.00 168 B 1.00
ArrayWhereIterator_ToList pr 1 56.08 ns 1.15 144 B 0.86
IEnumerableWhereSelectIterator_ToList main 1 87.87 ns 1.00 256 B 1.00
IEnumerableWhereSelectIterator_ToList pr 1 90.83 ns 1.03 232 B 0.91
ArrayWhereSelectIterator_ToList main 1 70.01 ns 1.00 224 B 1.00
ArrayWhereSelectIterator_ToList pr 1 73.59 ns 1.05 200 B 0.89
Enumerable_ToList main 5 84.91 ns 1.00 224 B 1.00
Enumerable_ToList pr 5 71.13 ns 0.83 144 B 0.64
IEnumerableSelectIterator_ToList main 5 168.50 ns 1.00 280 B 1.00
IEnumerableSelectIterator_ToList pr 5 153.26 ns 0.91 200 B 0.71
IteratorSelectIterator_ToList main 5 121.77 ns 1.00 248 B 1.00
IteratorSelectIterator_ToList pr 5 187.19 ns 1.54 248 B 1.00
OfTypeIterator_ToList main 5 101.74 ns 1.00 272 B 1.00
OfTypeIterator_ToList pr 5 88.26 ns 0.87 192 B 0.71
SelectManySingleSelectorIterator_ToList main 5 187.48 ns 1.00 352 B 1.00
SelectManySingleSelectorIterator_ToList pr 5 153.98 ns 0.82 272 B 0.77
IEnumerableSkipTakeIterator_ToLost main 5 91.36 ns 1.00 192 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 5 94.57 ns 1.04 192 B 1.00
IEnumerableWhereIterator_ToList main 5 170.63 ns 1.00 280 B 1.00
IEnumerableWhereIterator_ToList pr 5 154.21 ns 0.90 200 B 0.71
ArrayWhereIterator_ToList main 5 144.74 ns 1.00 288 B 1.00
ArrayWhereIterator_ToList pr 5 135.21 ns 0.94 208 B 0.72
IEnumerableWhereSelectIterator_ToList main 5 191.12 ns 1.00 344 B 1.00
IEnumerableWhereSelectIterator_ToList pr 5 176.70 ns 0.93 264 B 0.77
ArrayWhereSelectIterator_ToList main 5 166.66 ns 1.00 344 B 1.00
ArrayWhereSelectIterator_ToList pr 5 149.40 ns 0.90 264 B 0.77
Enumerable_ToList main 50 452.81 ns 1.00 1192 B 1.00
Enumerable_ToList pr 50 429.92 ns 0.94 504 B 0.42
IEnumerableSelectIterator_ToList main 50 543.63 ns 1.00 1248 B 1.00
IEnumerableSelectIterator_ToList pr 50 552.12 ns 1.02 560 B 0.45
IteratorSelectIterator_ToList main 50 701.63 ns 1.00 1304 B 1.00
IteratorSelectIterator_ToList pr 50 707.15 ns 1.01 608 B 0.47
OfTypeIterator_ToList main 50 499.60 ns 1.00 1240 B 1.00
OfTypeIterator_ToList pr 50 501.70 ns 1.00 552 B 0.45
SelectManySingleSelectorIterator_ToList main 50 1,267.24 ns 1.00 1320 B 1.00
SelectManySingleSelectorIterator_ToList pr 50 1,033.94 ns 0.82 632 B 0.48
IEnumerableSkipTakeIterator_ToLost main 50 475.09 ns 1.00 1248 B 1.00
IEnumerableSkipTakeIterator_ToLost pr 50 483.19 ns 1.02 552 B 0.44
IEnumerableWhereIterator_ToList main 50 566.51 ns 1.00 1248 B 1.00
IEnumerableWhereIterator_ToList pr 50 559.30 ns 0.99 560 B 0.45
ArrayWhereIterator_ToList main 50 474.29 ns 1.00 1616 B 1.00
ArrayWhereIterator_ToList pr 50 544.54 ns 1.15 928 B 0.57
IEnumerableWhereSelectIterator_ToList main 50 581.70 ns 1.00 1312 B 1.00
IEnumerableWhereSelectIterator_ToList pr 50 599.43 ns 1.03 624 B 0.48
ArrayWhereSelectIterator_ToList main 50 494.47 ns 1.00 1672 B 1.00
ArrayWhereSelectIterator_ToList pr 50 499.22 ns 1.01 984 B 0.59

Benchmark Code

Command Line
dotnet run -c Release `
   --corerun `
     D:\Code\_forks\dotnet-runtime-base\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\corerun.exe `
     D:\Code\_forks\dotnet-runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\corerun.exe `
   --launchCount 3
Program.cs
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests<>).Assembly).Run(args);

public partial class Tests<T> where T : new()
{
    private T _value = new();
    private T[] _valueArr = [new()];

    private IEnumerable<T> CreateEnumerable()
    {
        for (int i = 0; i < Count; i++) yield return _value;
    }

    private IEnumerable<T> CreateArray()
    {
        var result = new T[Count];
        for (int i = 0; i < Count; i++)
        {
            result[i] = _value;
        }
        return result;
    }

    [Params(1, 5, 50)]
    public int Count { get; set; }

    [Benchmark]
    public List<T> Enumerable_ToList() => 
        CreateEnumerable()
        .ToList();

    [Benchmark]
    public List<T> IEnumerableSelectIterator_ToList() =>
        AssertType(CreateEnumerable().Select(i => i), "IEnumerableSelectIterator`2")
        .ToList();

    [Benchmark]
    public List<T> IteratorSelectIterator_ToList() =>
        AssertType(CreateEnumerable().Skip(1).Select(i => i), "IteratorSelectIterator`2")
        .ToList();

    [Benchmark]
    public List<object> OfTypeIterator_ToList() =>
        AssertType(CreateEnumerable().OfType<object>(), "OfTypeIterator`1")
        .ToList();

    [Benchmark]
    public List<T> SelectManySingleSelectorIterator_ToList() =>
        AssertType(CreateEnumerable().SelectMany(i => _valueArr), "SelectManySingleSelectorIterator`2")
        .ToList();

    [Benchmark]
    public List<T> IEnumerableSkipTakeIterator_ToLost() =>
        AssertType(CreateEnumerable().Skip(1), "IEnumerableSkipTakeIterator`1")
        .ToList();

    [Benchmark]
    public List<T> IEnumerableWhereIterator_ToList() =>
        AssertType(CreateEnumerable().Where(_ => true), "IEnumerableWhereIterator`1")
        .ToList();

    [Benchmark]
    public List<T> ArrayWhereIterator_ToList() =>
        AssertType(CreateArray().Where(_ => true), "ArrayWhereIterator`1")
        .ToList();

    [Benchmark]
    public List<T> IEnumerableWhereSelectIterator_ToList() =>
        AssertType(CreateEnumerable().Where(_ => true).Select(i => i), "IEnumerableWhereSelectIterator`2")
        .ToList();

    [Benchmark]
    public List<T> ArrayWhereSelectIterator_ToList() =>
        AssertType(CreateArray().Where(_ => true).Select(i => i), "ArrayWhereSelectIterator`2")
        .ToList();


    private static TIterator AssertType<TIterator>(TIterator value, string type)
    {
        if(value!.GetType().Name is { } actual && actual != type)
        {
            throw new InvalidOperationException($"Expected {type} got {actual}");
        }

        return value;
    }
}

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class Int32Tests : Tests<int> { }

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class ObjectTests : Tests<object> { }

Performance Analysis

In all cases allocations are down and sometimes significantly so. As the size of the collection increases so do the savings due to reduced number of "unnecessary" array copies as the list grows.

Speed is more of a trade-off. For lower counts the PR is actually slower whereas for higher counts it is faster, again due to less time copying data around.

However this must be seen in context; we are trading off nanoseconds. In situations where nanoseconds of individual LINQ operations matter, it would probably be recommended to write the loops by hand anyway. On this basis I would say that creating less garbage to collect would probably be better for the overall system performance than the slower code for the Count less than 5 case.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 3, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment on lines +266 to +267
Span<T> span = CollectionsMarshal.AsSpan(result);
ToSpanInlined(span);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
Span<T> span = CollectionsMarshal.AsSpan(result);
ToSpanInlined(span);
ToSpanInlined(CollectionsMarshal.AsSpan(result));

return EnumerableToList(source);

[MethodImpl(MethodImplOptions.NoInlining)] // avoid large stack allocation impacting other paths
static List<TSource> EnumerableToList(IEnumerable<TSource> source)
Copy link
Member

@stephentoub stephentoub Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to change this case. It's one thing when we have specialized knowledge of the source and we're able to actually influence that source's behavior. But this case is just the exact equivalent of List<T>.ctor, and I'm hard pressed to come up with an explanation for why new List<T>(arbitraryEnumerable) would perform differently from arbitraryEnumerable.ToList(). If we believed this optimization to be truly important, why wouldn't it be done in List's ctor instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we believed this optimization to be truly important, why wouldn't it be done in List's ctor instead?

List<T> is such a ubiquitous type that complicating the implementation could result in some serious bloat, especially if this means we get ArrayPool<T> generic instantiations for any List<T> where T is a struct. This could tilt the cost benefit analysis towards "not worth it" for List<T>.ctor and "worth it" for ToList(). I don't know how trimming/code size is implemented though, so please ignore if wrong.

The other reason is that System.Linq is a different use-case to SPC; for hot paths (where common advice is to avoid LINQ) the slight penalty for low counts could be a trade-off not worth making, whereas in places LINQ is used it's probably preferable to create less garbage over raw speed.

These arguments is the best I can do, and I myself am not fully convinced. Either way please let me know your decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Linq community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants