Possibility to completely replace the Altair data server? #309

joelostblom · 2023-05-10T17:48:35Z

Someone just opened an issue about the data server not being available for altair 5.0 vega/altair#3048. While we probably want to bump that up for compatibility reasons, it also made me curious about if VegaFusion will be able to completely replace the data server at some point in the future or if some of its functionality is considered out of scope for VegaFusion.

I recall from vega/altair#2738 (comment) that VegaFusion relies on a JupyterWidget, whereas the data server works anywhere there is a Python kernel. In addition to that, the biggest outstanding feature I can identify is that the data server allows for unlimited rows by serving the data dynamically via an active Python kernel, which is quite useful for unaggregated datasets. Would similar functionality for large unaggregated datasets be in scope and possible to replicate in VegaFusion via an active Javascript kernel, DuckDB, or something similar?

jonmmease · 2023-05-11T14:17:51Z

While VegaFusion's widget renderer works differently than the data server, it accomplishes the same goal of separating large inline datasets from the JSON spec that is parsed by the Vega JavaScript library. In the VegaFusion widget renderer the datasets are sent to the client separately from the spec in arrow format.

It's true that the VegaFusion widget renderer requires a custom Jupyter Widget extension, but it's pretty widely supported at this point. It works locally in the classic notebook, JupyterLab, and VSCode. And remotely in Colab and Binder.

As I understand it, it's not trivial to get the Altair data server working in remote environments due to the need for proxying (https://github.com/altair-viz/altair_data_server/#remote-systems), so I wouldn't say it works "anywhere there is a Python kernel".

So I guess a question is, what are the environments that are currently compatible with the Altair data server that aren't supported by the VegaFusion widget renderer?

joelostblom · 2023-05-11T17:20:29Z

It's true that the VegaFusion widget renderer requires a custom Jupyter Widget extension, but it's pretty widely supported at this point. It works locally in the classic notebook, JupyterLab, and VSCode. And remotely in Colab and Binder... As I understand it, it's not trivial to get the Altair data server working in remote environments due to the need for proxying (https://github.com/altair-viz/altair_data_server/#remote-systems), so I wouldn't say it works "anywhere there is a Python kernel".

Ah ok I didn't realize that this approach was widely supported outside jupyterlab and that there are issues with altair data_server on remote servers. Based on that it sounds like most environments where altair data server is currently used are covered by vegafusion and I agree with you that identifying which are not would be the next step, but nothing comes to mind on the top of my head. I would like to try comparing altair data server and vegafusion in a dashboard context served remotely, e.g. via panel on heroku or similar because I think that is useful functionality, but it sounds like maybe neither of them will work.

While VegaFusion's widget renderer works differently than the data server, it accomplishes the same goal of separating large inline datasets from the JSON spec that is parsed by the Vega JavaScript library. In the VegaFusion widget renderer the datasets are sent to the client separately from the spec in arrow format.

I am a bit confused about this because in the large dataset docs we say that "VegaFusion is a third-party package that re-implements most Vega-Lite transforms for evaluation in the Python kernel. This makes it possible to scale many Altair charts to millions of rows as long as they include some form of aggregation.". Should this be updated to reflect that vegafusion works also for unaggregated charts? I tried testing this myself right now but I ran into a few issues (opened separate tickets)

jonmmease · 2023-05-11T18:17:43Z

I am a bit confused about this because in the large dataset docs we say that "VegaFusion is a third-party package that re-implements most Vega-Lite transforms for evaluation in the Python kernel. This makes it possible to scale many Altair charts to millions of rows as long as they include some form of aggregation.". Should this be updated to reflect that vegafusion works also for unaggregated charts?

Yeah, this description could be improved. The "scale many Altair charts to millions of rows as long as they include some form of aggregation" statement is true of the VegaFusion mime renderer, which inlines the post-transformed data into the Vega specification that's renderered by regular Vega renderers. The widget renderer does the same aggregations, but it also helps support larger unaggregated datasets by not inlining them into the Vega spec, but instead transporting them separately to the browser.

It would be good to do a performance comparison of VegaFusion vs the Altair data server for unaggregated charts.

This was referenced May 11, 2023

Be clearer about how vegafusion works vega/altair#3052

Merged

Local data server breaks with transition to altair 5.0.0 vega/altair#3048

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility to completely replace the Altair data server? #309

Possibility to completely replace the Altair data server? #309

joelostblom commented May 10, 2023

jonmmease commented May 11, 2023

joelostblom commented May 11, 2023

jonmmease commented May 11, 2023 •

edited

Loading

Possibility to completely replace the Altair data server? #309

Possibility to completely replace the Altair data server? #309

Comments

joelostblom commented May 10, 2023

jonmmease commented May 11, 2023

joelostblom commented May 11, 2023

jonmmease commented May 11, 2023 • edited Loading

jonmmease commented May 11, 2023 •

edited

Loading