-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement LINQ ToList using SegmentedArrayBuilder to reduce allocations #104365
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/area-system-linq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Span<T> span = CollectionsMarshal.AsSpan(result); | ||
ToSpanInlined(span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Span<T> span = CollectionsMarshal.AsSpan(result); | |
ToSpanInlined(span); | |
ToSpanInlined(CollectionsMarshal.AsSpan(result)); |
return EnumerableToList(source); | ||
|
||
[MethodImpl(MethodImplOptions.NoInlining)] // avoid large stack allocation impacting other paths | ||
static List<TSource> EnumerableToList(IEnumerable<TSource> source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to change this case. It's one thing when we have specialized knowledge of the source and we're able to actually influence that source's behavior. But this case is just the exact equivalent of List<T>.ctor
, and I'm hard pressed to come up with an explanation for why new List<T>(arbitraryEnumerable)
would perform differently from arbitraryEnumerable.ToList()
. If we believed this optimization to be truly important, why wouldn't it be done in List's ctor instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we believed this optimization to be truly important, why wouldn't it be done in List's ctor instead?
List<T>
is such a ubiquitous type that complicating the implementation could result in some serious bloat, especially if this means we get ArrayPool<T>
generic instantiations for any List<T>
where T
is a struct. This could tilt the cost benefit analysis towards "not worth it" for List<T>.ctor
and "worth it" for ToList()
. I don't know how trimming/code size is implemented though, so please ignore if wrong.
The other reason is that System.Linq
is a different use-case to SPC
; for hot paths (where common advice is to avoid LINQ) the slight penalty for low counts could be a trade-off not worth making, whereas in places LINQ is used it's probably preferable to create less garbage over raw speed.
These arguments is the best I can do, and I myself am not fully convinced. Either way please let me know your decision.
After #96570,
ToArray()
in many places is faster and has fewer allocations thatToList()
. As aList
is just a wrapper over an array, we can re-use most of the logic inSegmentedArrayBuilder
to buildList
s, too.There were some more complex cases using
SegmentedArrayBuilder
for exampleConcat
where I left the code as-is. I can also experiment with performance numbers for those if it is requested.Benchmarks
Int32Tests
ObjectTests
Benchmark Code
Command Line
Program.cs
Performance Analysis
In all cases allocations are down and sometimes significantly so. As the size of the collection increases so do the savings due to reduced number of "unnecessary" array copies as the list grows.
Speed is more of a trade-off. For lower counts the PR is actually slower whereas for higher counts it is faster, again due to less time copying data around.
However this must be seen in context; we are trading off nanoseconds. In situations where nanoseconds of individual LINQ operations matter, it would probably be recommended to write the loops by hand anyway. On this basis I would say that creating less garbage to collect would probably be better for the overall system performance than the slower code for the Count less than 5 case.