You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thank you very much for polars! This is my first issue and I wasn't sure if this is rather a bug or feature request.
I am using a list of dataclasses.dataclass to initialize a pl.DataFrame which works fine. However the intialization takes much longer compared to a case where I first convert the list of dataclasses.dataclass to a dict and perform the df initialization with that:
importdataclassesimporttimeimportpolarsaspl@dataclasses.dataclassclassRow:
int_value: intstr_value: strlist_value: list[int]
ROWS= [
Row(index, str(index), [index]*100) forindexinrange(10000)
]
defdf_from_rows():
returnpl.DataFrame(ROWS)
defdf_from_dict():
data_dict= {}
forfieldindataclasses.fields(Row):
data_dict[field.name] = [getattr(row, field.name) forrowinROWS]
returnpl.DataFrame(data_dict)
start=time.perf_counter()
df_from_rows()
end=time.perf_counter()
print(f"Time taken to create DataFrame from rows: {end-start}")
start=time.perf_counter()
df_from_dict()
end=time.perf_counter()
print(f"Time taken to create DataFrame from dict: {end-start}")
With polars 1.17.1 I get
Time taken to create DataFrame from rows: 0.5881143999995402
Time taken to create DataFrame from dict: 0.06102950000058627
I don't know the details but my guess is that dataclass is first converted to a tuplewhich creates deepcopies.
Would it be possible to enhance the initialization speed for this use case?
The text was updated successfully, but these errors were encountered:
Your implementation make sence! I don't know (I'm not developer of polars), I think Pull Request would be appreciated. But, wouldn't it be better to make a proposal directly to python dataclasses.asdict(deepcopy=False)?
Some people already seem to be working on superfluous deep copies in dataclasses but I guess it will take quite some time till we see it in the next Python release.
I can have a look at opening a PR here if one of the polars devs consider it useful.
Description
First of all thank you very much for polars! This is my first issue and I wasn't sure if this is rather a bug or feature request.
I am using a list of
dataclasses.dataclass
to initialize apl.DataFrame
which works fine. However the intialization takes much longer compared to a case where I first convert the list ofdataclasses.dataclass
to adict
and perform the df initialization with that:With polars
1.17.1
I getI don't know the details but my guess is that
dataclass
is first converted to atuple
which creates deepcopies.Would it be possible to enhance the initialization speed for this use case?
The text was updated successfully, but these errors were encountered: