Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.unique() should raise if any subset column doesn't exist on empty frame. #20209

Open
2 tasks done
mrjsj opened this issue Dec 7, 2024 · 1 comment · May be fixed by #20411
Open
2 tasks done

DataFrame.unique() should raise if any subset column doesn't exist on empty frame. #20209

mrjsj opened this issue Dec 7, 2024 · 1 comment · May be fixed by #20411
Labels
bug Something isn't working good first issue Good for newcomers P-low Priority: low python Related to Python Polars

Comments

@mrjsj
Copy link

mrjsj commented Dec 7, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame(
    {
        "ID": [],
        "Name": []
    },
    schema={"ID": pl.Int64, "Name": pl.String}
)
df = df.unique(subset="id")
print(df)
# shape: (0, 2)
# ┌──────┬──────┐
# │ ID   ┆ Name │
# │ ---  ┆ ---  │
# │ null ┆ null │
# ╞══════╪══════╡
# └──────┴──────┘

df = pl.DataFrame(
    {
        "ID": [1, 2, 1, 2],
        "Name": ["foo", "bar", "baz", "baa"]
    },
    schema={"ID": pl.Int64, "Name": pl.String}
)
df = df.unique(subset="id")
# raises `polars.exceptions.ColumnNotFoundError: "id" not found`

Log output

No response

Issue description

Using .unique() on an empty dataframe with a subset column which doesn't exist in the dataframe doesn't raise an error.

This is inconsistent with other methods, i.e. .select()

Expected behavior

.unique() should raise an error if any subset column is not in the dataframe.

Installed versions

--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            macOS-15.0.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             True

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            0.21.0
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              18.1.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@mrjsj mrjsj added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Dec 7, 2024
@mrjsj mrjsj changed the title DataFrame.unique() should raise if any subset column doesn't exist on empty frame. DataFrame.unique() should raise if any subset column doesn't exist on empty frame. Dec 7, 2024
@coastalwhite coastalwhite added good first issue Good for newcomers P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Dec 8, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Dec 8, 2024
@nnlnr
Copy link

nnlnr commented Dec 8, 2024

👋 Hi there - I'd like to pick this up if it's still available.

Nikhil-Doye added a commit to Nikhil-Doye/polars that referenced this issue Dec 20, 2024
Fixes pola-rs#20209

Update `DataFrame.unique()` to raise an error if any subset column is not in the dataframe.

* Modify `crates/polars-core/src/frame/mod.rs` to check for the existence of subset columns in the `unique` method and raise a `ColumnNotFoundError` if any subset column is not found.
* Add a test case in `crates/polars-core/src/tests.rs` to verify that `unique` raises an error for non-existent subset columns.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/pola-rs/polars/issues/20209?shareId=XXXX-XXXX-XXXX-XXXX).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers P-low Priority: low python Related to Python Polars
Projects
Status: Ready
3 participants