perf: maintain_order can lead to faster overall operations #20346
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
This outputs
Log output
Issue description
I ran into an issue in my code where adding
maintain_order=True
made some aggregation workflows around 10% faster overall, even though thegroup_by
step is slower.I've tried to show the behaviour here minimally - although in this case the overall query is slower, the profiling shows that the second
group_by
is twice as fast followingmaintain_order=True
rather thanmaintain_order=False
.In general, I see this behaviour for other aggregation ops (not just
.first()
) and most settings of NUM_BIG_G and NUM_SMALL_G - although it's definitely more pronounced whenNUM_BIG_G
>>NUM_SMALL_G
. Laziness also doesn't matter.I think some fast path or caching ends up helping out here? But I'm not familiar enough with rust yet to dig the issue all the way down in the rust side. No sorted flags are set on the frame at any point so I don't see anything obvious on the Python side.
Not 100% sure if this is a bug (and the same fast path exists but is missed with
maintain_order=False
) or just a misunderstanding of how caching works and why it's able to take a faster path here.Expected behavior
I would expect the time for the second
group_by(groups)
in profiling to match in both cases.Installed versions
The text was updated successfully, but these errors were encountered: