Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Polars into ui.table and ui.aggrid #3969

Merged
merged 11 commits into from
Nov 13, 2024

Conversation

trivedihoney
Copy link
Contributor

@trivedihoney trivedihoney commented Nov 10, 2024

This PR allows users to use from_polars and update_from_polars functions in ui.table.
Users will be able to use from_polars in ui.aggrid as well.

Changes were made in pyproject.toml Python version since Poetry was not allowing Polars installation with Python version ^3.8.
I have not worked extensively with various datatypes in polars. So, test cases can be improved in that regard.
This is a WIP. I have integrated polars in ui.table so far.

I'm new to contributing to open-source projects. Any feedback is greatly appreciated.

@trivedihoney
Copy link
Contributor Author

from_polars method in ui.aggrid has been implemented

@trivedihoney trivedihoney marked this pull request as ready for review November 11, 2024 13:57
@falkoschindler falkoschindler added the enhancement New feature or request label Nov 11, 2024
@falkoschindler falkoschindler added this to the 2.6 milestone Nov 11, 2024
Copy link
Contributor

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this pull request, @trivedihoney! Support for Polars dataframes is a great addition to the library!

I just reviewed your code and improved it a little:

  • I changed the minimum Python version back to 3.8. Even though its end-of-life date has passed, we might support it a little longer to avoid breaking changes in applications that haven't switched to newer Python versions yet. Depending on the Python version we can simply choose an older Polars version that still supports Python 3.8.
  • To avoid too much duplicate code, I parameterized the pytests. This way we can re-use quite some code for both dataframe types.
  • I re-ordered the new methods so that Pandas always comes before Polars, which feels more intuitive.

The one thing I'm still unsure about is handling the "problematic datatypes". See #1698, #1972, #1983, and #2706 for a little background on this subject. Some datatypes like datetime, timedelta and complex numbers caused exceptions when extracting rows and columns, so we convert them beforehand. But most of this doesn't seem to apply to Polars.

  • datetime seems to work without conversion.
  • timedelta causes an error: "polars.exceptions.InvalidOperationError: casting from Duration(Microseconds) to String not supported"
  • complex numbers cause an error: "polars.exceptions.ComputeError: cannot cast 'Object' type"
  • pd.Period doesn't exist in Polars, so I guess we can ignore it.

Maybe we find ways to avoid these errors. Otherwise we can still improve it later in another pull request.

Apart from that, I think casting every row to string and replacing null values is not necessary. Commenting it out doesn't seem to make a difference. Do you agree to remove these lines?

@trivedihoney
Copy link
Contributor Author

Hi @falkoschindler,
Thanks for the detailed reply!

timedelta causes an error: "polars.exceptions.InvalidOperationError: casting from Duration(Microseconds) to String not supported"

timedelta is internally represented as pl.Duration in polars. In the current version of polars, to_string operation is not supported for dtype duration[μs]. There is a pull request open that is supposed to handle this issue.
We can use a workaround like below to get duration representation in seconds for current version, but I think it will be best to wait for a newer version of polars for this particular issue.

df = df.with_columns(pl.col(pl.Duration).to_physical()) # timedelta 1 day --> 86400000000

complex numbers cause an error: "polars.exceptions.ComputeError: cannot cast 'Object' type"

When we store Complex numbers in polars dataframe column, the column datatype becomes Object (Used for arbitrary python objects). It is possible to convert an object column to string using map_elements as shown below, but it is slow and discouraged in documentation. I am not sure if there are efficient and vectorized ways to achieve this.

df = df.with_columns(pl.col(pl.Object).map_elements(lambda x: str(x), return_dtype=pl.String))

Apart from that, I think casting every row to string and replacing null values is not necessary. Commenting it out doesn't seem to make a difference. Do you agree to remove these lines?

I added cast every row as string code because I thought it will handle complicated data types like list[i64], pl.Duration, Nested Columns, etc. But now, upon testing I realized that it doesn't make a difference. So, we can remove those lines for now.

Just want to add one more thing that I noticed while testing,
If I add list as an element in column as shown below, then niceGUI is not throwing any errors, and the page keeps on loading. This behavior is present in from_pandas and from_polars both.
Below is the minimal code to reproduce the issue.

from nicegui import ui
import polars as pl

@ui.page("/")
def index_page():
    df = pl.DataFrame( 
        {"nested" : [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
    )
    ui.table.from_polars(df)

ui.run()

Copy link
Contributor

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback, @trivedihoney!

  1. Let's leave the problems with pl.Duration and complex numbers unresolved. There is no obvious solution, so we better leave it to the user to decide how to preprocess the data.
  2. I've removed the casting and null replacement.
  3. The problem with table rows containing lists is a known issue (see Browser can't load page when table rows contain lists #2744) and occurs independently of Polars:
    ui.table(rows=[{'x': [42]}])
    Unfortunately I still don't have a clue. But that's out of scope of this PR.

Let's finish this pull request by merging into main 🚀

@falkoschindler falkoschindler merged commit a3538aa into zauberzeug:main Nov 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants