Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to completely replace the Altair data server? #309

Open
joelostblom opened this issue May 10, 2023 · 3 comments
Open

Possibility to completely replace the Altair data server? #309

joelostblom opened this issue May 10, 2023 · 3 comments

Comments

@joelostblom
Copy link

Someone just opened an issue about the data server not being available for altair 5.0 vega/altair#3048. While we probably want to bump that up for compatibility reasons, it also made me curious about if VegaFusion will be able to completely replace the data server at some point in the future or if some of its functionality is considered out of scope for VegaFusion.

I recall from vega/altair#2738 (comment) that VegaFusion relies on a JupyterWidget, whereas the data server works anywhere there is a Python kernel. In addition to that, the biggest outstanding feature I can identify is that the data server allows for unlimited rows by serving the data dynamically via an active Python kernel, which is quite useful for unaggregated datasets. Would similar functionality for large unaggregated datasets be in scope and possible to replicate in VegaFusion via an active Javascript kernel, DuckDB, or something similar?

@jonmmease
Copy link
Collaborator

While VegaFusion's widget renderer works differently than the data server, it accomplishes the same goal of separating large inline datasets from the JSON spec that is parsed by the Vega JavaScript library. In the VegaFusion widget renderer the datasets are sent to the client separately from the spec in arrow format.

It's true that the VegaFusion widget renderer requires a custom Jupyter Widget extension, but it's pretty widely supported at this point. It works locally in the classic notebook, JupyterLab, and VSCode. And remotely in Colab and Binder.

As I understand it, it's not trivial to get the Altair data server working in remote environments due to the need for proxying (https://github.com/altair-viz/altair_data_server/#remote-systems), so I wouldn't say it works "anywhere there is a Python kernel".

So I guess a question is, what are the environments that are currently compatible with the Altair data server that aren't supported by the VegaFusion widget renderer?

@joelostblom
Copy link
Author

It's true that the VegaFusion widget renderer requires a custom Jupyter Widget extension, but it's pretty widely supported at this point. It works locally in the classic notebook, JupyterLab, and VSCode. And remotely in Colab and Binder... As I understand it, it's not trivial to get the Altair data server working in remote environments due to the need for proxying (https://github.com/altair-viz/altair_data_server/#remote-systems), so I wouldn't say it works "anywhere there is a Python kernel".

Ah ok I didn't realize that this approach was widely supported outside jupyterlab and that there are issues with altair data_server on remote servers. Based on that it sounds like most environments where altair data server is currently used are covered by vegafusion and I agree with you that identifying which are not would be the next step, but nothing comes to mind on the top of my head. I would like to try comparing altair data server and vegafusion in a dashboard context served remotely, e.g. via panel on heroku or similar because I think that is useful functionality, but it sounds like maybe neither of them will work.

While VegaFusion's widget renderer works differently than the data server, it accomplishes the same goal of separating large inline datasets from the JSON spec that is parsed by the Vega JavaScript library. In the VegaFusion widget renderer the datasets are sent to the client separately from the spec in arrow format.

I am a bit confused about this because in the large dataset docs we say that "VegaFusion is a third-party package that re-implements most Vega-Lite transforms for evaluation in the Python kernel. This makes it possible to scale many Altair charts to millions of rows as long as they include some form of aggregation.". Should this be updated to reflect that vegafusion works also for unaggregated charts? I tried testing this myself right now but I ran into a few issues (opened separate tickets)

@jonmmease
Copy link
Collaborator

jonmmease commented May 11, 2023

I am a bit confused about this because in the large dataset docs we say that "VegaFusion is a third-party package that re-implements most Vega-Lite transforms for evaluation in the Python kernel. This makes it possible to scale many Altair charts to millions of rows as long as they include some form of aggregation.". Should this be updated to reflect that vegafusion works also for unaggregated charts?

Yeah, this description could be improved. The "scale many Altair charts to millions of rows as long as they include some form of aggregation" statement is true of the VegaFusion mime renderer, which inlines the post-transformed data into the Vega specification that's renderered by regular Vega renderers. The widget renderer does the same aggregations, but it also helps support larger unaggregated datasets by not inlining them into the Vega spec, but instead transporting them separately to the browser.

It would be good to do a performance comparison of VegaFusion vs the Altair data server for unaggregated charts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants