Skip to content

Commit

Permalink
[ENH] improve performance for polars' pivot_longer (#1377)
Browse files Browse the repository at this point in the history
* faster pivot_longer for non dot value

* fix docs and tests

* fix docs and tests

* fix doc

* fix doc pivot_longer_spec

* fix doc pivot_longer_spec

* updates

* updates

* updates

* fix docs

* fix tests

* change sort logic for `complete`

* updates to complete

* restore inital setup for complete

* remove dead code

* use left join

* update docs for pivot_longer

* WIP - expand

* Delete janitor/polars/expand.py

* remove expand

* remove expand

---------

Co-authored-by: samuel.oranyeli <[email protected]>
Co-authored-by: Eric Ma <[email protected]>
  • Loading branch information
3 people authored Jul 4, 2024
1 parent 65ccf97 commit d0c2544
Show file tree
Hide file tree
Showing 3 changed files with 279 additions and 302 deletions.
4 changes: 2 additions & 2 deletions janitor/polars/complete.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,14 +385,14 @@ def _complete(

no_columns_to_fill = set(df.columns) == set(uniques.columns)
if fill_value is None or no_columns_to_fill:
return uniques.join(df, on=uniques.columns, how="full", coalesce=True)
return uniques.join(df, on=uniques.columns, how="left", coalesce=True)
idx = None
columns_to_select = df.columns
if not explicit:
idx = "".join(df.columns)
idx = f"{idx}_"
df = df.with_row_index(name=idx)
df = uniques.join(df, on=uniques.columns, how="full", coalesce=True)
df = uniques.join(df, on=uniques.columns, how="left", coalesce=True)
# exclude columns that were not used
# to generate the combinations
exclude_columns = uniques.columns
Expand Down
Loading

0 comments on commit d0c2544

Please sign in to comment.