Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load nested JSON data with polars without using explode #20382

Open
kolibril13 opened this issue Dec 20, 2024 · 1 comment
Open

load nested JSON data with polars without using explode #20382

kolibril13 opened this issue Dec 20, 2024 · 1 comment
Labels
enhancement New feature or an improvement of an existing feature

Comments

@kolibril13
Copy link

Description

I am currently using this snippet to load and explode JSON objects with polars.
My question: Is there a better way to load my desired data format without the explode call?

import polars as pl
from io import StringIO

json_file = StringIO(
    """
{
  "Star": [
    [58.2136, 91.8819, 0.0],
    [58.1961, 92.215, 0.0]
  ],
  "Is_Visible": [
    [true],
    [false]
  ],
  "Intensity": [
    [10],
    [20]
  ]
}
"""
)

df = pl.read_json(json_file)
columns_to_explode = [col for col in df.columns if df[col].dtype == pl.List(pl.List)]
df2 = df.explode(columns_to_explode)
print(df)
print(df2)
shape: (1, 3)
┌─────────────────────────────────┬───────────────────┬─────────────────┐
│ Star                            ┆ Is_Visible        ┆ Intensity       │
│ ---                             ┆ ---               ┆ ---             │
│ list[list[f64]]                 ┆ list[list[bool]]  ┆ list[list[i64]] │
╞═════════════════════════════════╪═══════════════════╪═════════════════╡
│ [[58.2136, 91.8819, 0.0], [58.… ┆ [[true], [false]] ┆ [[10], [20]]    │
└─────────────────────────────────┴───────────────────┴─────────────────┘
shape: (2, 3)
┌─────────────────────────┬────────────┬───────────┐
│ Star                    ┆ Is_Visible ┆ Intensity │
│ ---                     ┆ ---        ┆ ---       │
│ list[f64]               ┆ list[bool] ┆ list[i64] │
╞═════════════════════════╪════════════╪═══════════╡
│ [58.2136, 91.8819, 0.0] ┆ [true]     ┆ [10]      │
│ [58.1961, 92.215, 0.0]  ┆ [false]    ┆ [20]      │
└─────────────────────────┴────────────┴───────────┘
@kolibril13 kolibril13 added the enhancement New feature or an improvement of an existing feature label Dec 20, 2024
@ritchie46
Copy link
Member

ritchie46 commented Dec 23, 2024

That's asking to ignore the json schema, or at least interpret it differently, which is not something I am keen on. What's wrong with exploding the data? I don't think compute shouldn't be pushed down in the readers (at least kept to a minimum)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants