-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility to completely replace the Altair data server? #309
Comments
While VegaFusion's widget renderer works differently than the data server, it accomplishes the same goal of separating large inline datasets from the JSON spec that is parsed by the Vega JavaScript library. In the VegaFusion widget renderer the datasets are sent to the client separately from the spec in arrow format. It's true that the VegaFusion widget renderer requires a custom Jupyter Widget extension, but it's pretty widely supported at this point. It works locally in the classic notebook, JupyterLab, and VSCode. And remotely in Colab and Binder. As I understand it, it's not trivial to get the Altair data server working in remote environments due to the need for proxying (https://github.com/altair-viz/altair_data_server/#remote-systems), so I wouldn't say it works "anywhere there is a Python kernel". So I guess a question is, what are the environments that are currently compatible with the Altair data server that aren't supported by the VegaFusion widget renderer? |
Ah ok I didn't realize that this approach was widely supported outside jupyterlab and that there are issues with altair data_server on remote servers. Based on that it sounds like most environments where altair data server is currently used are covered by vegafusion and I agree with you that identifying which are not would be the next step, but nothing comes to mind on the top of my head. I would like to try comparing altair data server and vegafusion in a dashboard context served remotely, e.g. via panel on heroku or similar because I think that is useful functionality, but it sounds like maybe neither of them will work.
I am a bit confused about this because in the large dataset docs we say that "VegaFusion is a third-party package that re-implements most Vega-Lite transforms for evaluation in the Python kernel. This makes it possible to scale many Altair charts to millions of rows as long as they include some form of aggregation.". Should this be updated to reflect that vegafusion works also for unaggregated charts? I tried testing this myself right now but I ran into a few issues (opened separate tickets) |
Yeah, this description could be improved. The "scale many Altair charts to millions of rows as long as they include some form of aggregation" statement is true of the VegaFusion mime renderer, which inlines the post-transformed data into the Vega specification that's renderered by regular Vega renderers. The widget renderer does the same aggregations, but it also helps support larger unaggregated datasets by not inlining them into the Vega spec, but instead transporting them separately to the browser. It would be good to do a performance comparison of VegaFusion vs the Altair data server for unaggregated charts. |
Someone just opened an issue about the data server not being available for altair 5.0 vega/altair#3048. While we probably want to bump that up for compatibility reasons, it also made me curious about if VegaFusion will be able to completely replace the data server at some point in the future or if some of its functionality is considered out of scope for VegaFusion.
I recall from vega/altair#2738 (comment) that VegaFusion relies on a JupyterWidget, whereas the data server works anywhere there is a Python kernel. In addition to that, the biggest outstanding feature I can identify is that the data server allows for unlimited rows by serving the data dynamically via an active Python kernel, which is quite useful for unaggregated datasets. Would similar functionality for large unaggregated datasets be in scope and possible to replicate in VegaFusion via an active Javascript kernel, DuckDB, or something similar?
The text was updated successfully, but these errors were encountered: