Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to transformed_data for reconstructed (from_dict) charts #354

Open
binste opened this issue Jul 7, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@binste
Copy link

binste commented Jul 7, 2023

Relates to #313. For layered and multi-view charts, Altair takes a bit of a shortcut and does not use the exact classes which map to the Vega-Lite schema but instead uses the API convenience classes such as alt.Chart, alt.LayerChart, etc. For example for the layered chart below, the elements of layer are alt.Chart instances:

import altair as alt
import vegafusion as vf
from vega_datasets import data

source = data.wheat()

bar = alt.Chart(source).mark_bar().encode(
    x='year:O',
    y='wheat:Q'
)

rule = alt.Chart(source).mark_rule(color='red').encode(
    y='mean(wheat):Q'
)

layer_chart = (bar + rule).properties(width=600)
type(layer_chart.layer[0])  # altair.vegalite.v5.api.Chart

However, when reconstructing this chart from a dictionary, the layers are of type UnitSpec which is the correct class following the VL schema:

reconstructed_layer_chart = alt.Chart.from_dict(layer_chart.to_dict())
type(reconstructed_layer_chart.layer[0])  # altair.vegalite.v5.schema.core.UnitSpec

When calling transformed_data on the reconstructed chart, it fails as the isinstance checks do not accept UnitSpec as an equivalent to alt.Chart although it should be:

vf.transformed_data(reconstructed_layer_chart)
ValueError: transformed_data accepts an instance of Chart, FacetChart, LayerChart, HConcatChart, VConcatChart, or ConcatChart
Received value of type: <class 'altair.vegalite.v5.schema.core.UnitSpec'>

All classes which I think can be treated as equivalent to the first one mentioned in every bullet:

  • Chart: TopLevelUnitSpec (parent class of Chart), FacetedUnitSpec, UnitSpec, UnitSpecWithFrame, NonNormalizedSpec (used in concat charts for chart objects in e.g. hconcat attribute)
  • LayerChart: TopLevelLayerSpec (parent class of LayerChart), LayerSpec
  • RepeatChart: TopLevelRepeatSpec (parent class of RepeatChart), RepeatSpec
  • ConcatChart: TopLevelConcatSpec (parent class of ConcatChart), ConcatSpecGenericSpec
  • HConcatChart: TopLevelHConcatSpec (parent class of HConcatChart), HConcatSpecGenericSpec
  • VConcatChart: TopLevelVConcatSpec (parent class of VConcatChart), VConcatSpecGenericSpec
  • FacetChart: TopLevelFacetSpec (parent class of FacetChart), FacetSpec

Spec classes which I'm not sure about but we can probably ignore as repeat charts are not supported yet by transformed_data anyway:

  • LayerRepeatSpec
  • NonLayerRepeatSpec
  • GenericUnitSpecEncodingAnyMark

Just a reminder that the same changes would need to be applied to https://github.com/altair-viz/altair/blob/master/altair/utils/_transformed_data.py.

Btw, my use case for the above is that a frontend sends the chart specifications as a JSON to a backend API which executes vf.transformed_data(alt.Chart.from_dict(spec)) and then returns the extracted data as an Excel file to the user of the web application.

@jonmmease
Copy link
Collaborator

Thanks for the report @binste, I think this all makes sense. And now that chart.transformed_data is in Altair master, we need to fix it there as well!

@jonmmease jonmmease added the bug Something isn't working label Jul 7, 2023
@binste
Copy link
Author

binste commented Jul 7, 2023

I don't have a working development setup for VegaFusion right now. Is it easy to set it up to work purely on the Python part without Rust or do I need to compile the Rust code anyway? I assume the later. Let me know if I should land a hand on implementing the changes in Altair.

@jonmmease
Copy link
Collaborator

Let me know if I should land a hand on implementing the changes in Altair.

You can develop the Python part of VegaFusion on its own. But if you want to work on fixing it in the Altair implementation of chart.transformed_data (which doesn't use the vf.transformed_data implementation), I could pretty easily copy the fixes into the VegaFusion implementation afterward.

@binste
Copy link
Author

binste commented Jul 7, 2023

Great, I'll work on the Altair implementation probably early next week.

dangotbanned added a commit to dangotbanned/altair that referenced this issue Jun 10, 2024
…am` test

Struggling to find the source of the failure, the `mark` is my best guess but currently can't reproduce the following error:

```
___________________________________________ test_compound_chart_examples[False-scatter_with_layered_histogram.py-all_rows18-all_cols18] ___________________________________________
[gw2] win32 -- Python 3.8.19 C:\Users\*\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\Scripts\python.exe

filename = 'scatter_with_layered_histogram.py', all_rows = [2, 17], all_cols = [['gender'], ['__count']], to_reconstruct = False

    @pytest.mark.skipif(vf is None, reason="vegafusion not installed")
    # fmt: off
    @pytest.mark.parametrize("filename,all_rows,all_cols", [
        ("errorbars_with_std.py", [10, 10], [["upper_yield"], ["extent_yield"]]),
        ("candlestick_chart.py", [44, 44], [["low"], ["close"]]),
        ("co2_concentration.py", [713, 7, 7], [["first_date"], ["scaled_date"], ["end"]]),
        ("falkensee.py", [2, 38, 38], [["event"], ["population"], ["population"]]),
        ("heat_lane.py", [10, 10], [["bin_count_start"], ["y2"]]),
        ("histogram_responsive.py", [20, 20], [["__count"], ["__count"]]),
        ("histogram_with_a_global_mean_overlay.py", [9, 1], [["__count"], ["mean_IMDB_Rating"]]),
        ("horizon_graph.py", [20, 20], [["x"], ["ny"]]),
        ("interactive_cross_highlight.py", [64, 64, 13], [["__count"], ["__count"], ["Major_Genre"]]),
        ("interval_selection.py", [123, 123], [["price_start"], ["date"]]),
        ("layered_chart_with_dual_axis.py", [12, 12], [["month_date"], ["average_precipitation"]]),
        ("layered_heatmap_text.py", [9, 9], [["Cylinders"], ["mean_horsepower"]]),
        ("multiline_highlight.py", [560, 560], [["price"], ["date"]]),
        ("multiline_tooltip.py", [300, 300, 300, 0, 300], [["x"], ["y"], ["y"], ["x"], ["x"]]),
        ("pie_chart_with_labels.py", [6, 6], [["category"], ["value"]]),
        ("radial_chart.py", [6, 6], [["values"], ["values_start"]]),
        ("scatter_linked_table.py", [392, 14, 14, 14], [["Year"], ["Year"], ["Year"], ["Year"]]),
        ("scatter_marginal_hist.py", [34, 150, 27], [["__count"], ["species"], ["__count"]]),
        ("scatter_with_layered_histogram.py", [2, 17], [["gender"], ["__count"]]),
        ("scatter_with_minimap.py", [1461, 1461], [["date"], ["date"]]),
        ("scatter_with_rolling_mean.py", [1461, 1461], [["date"], ["rolling_mean"]]),
        ("seattle_weather_interactive.py", [1461, 5], [["date"], ["__count"]]),
        ("select_detail.py", [20, 1000], [["id"], ["x"]]),
        ("simple_scatter_with_errorbars.py", [5, 5], [["x"], ["upper_ymin"]]),
        ("stacked_bar_chart_with_text.py", [60, 60], [["site"], ["site"]]),
        ("us_employment.py", [120, 1, 2], [["month"], ["president"], ["president"]]),
        ("us_population_pyramid_over_time.py", [19, 38, 19], [["gender"], ["year"], ["gender"]]),
    ])
    # fmt: on
    @pytest.mark.parametrize("to_reconstruct", [True, False])
    def test_compound_chart_examples(filename, all_rows, all_cols, to_reconstruct):
        source = pkgutil.get_data(examples_methods_syntax.__name__, filename)
        chart = eval_block(source)
        if to_reconstruct:
            # When reconstructing a Chart, Altair uses different classes
            # then what might have been originally used. See
            # vega/vegafusion#354 for more info.
            chart = alt.Chart.from_dict(chart.to_dict())
        dfs = chart.transformed_data()

        if not to_reconstruct:
            # Only run assert statements if the chart is not reconstructed. Reason
            # is that for some charts, the original chart contained duplicated datasets
            # which disappear when reconstructing the chart.
            assert len(dfs) == len(all_rows)
            for df, rows, cols in zip(dfs, all_rows, all_cols):
>               assert len(df) == rows
E               assert 19 == 17
E                +  where 19 = len(    bin_step_5_age  bin_step_5_age_end gender  __count\n0             45.0                50.0      M      247\n1       ...       11\n17
   30.0                35.0      F        5\n18            70.0                75.0      F        1)

tests\test_transformed_data.py:132: AssertionError
```
binste added a commit to vega/altair that referenced this issue Jun 27, 2024
…ormance (#3431)

* ci: Add additional `ruff` rules to `pyproject.toml`

Only highly autofixable groups from [#3430](#3430).
Aiming to refine further, depending on how much manual fixing this generates.

* fix: add item to `pyproject.toml` to silence import errors for `pylance|pyright`

* ci: add subset of `SIM` to `extend-safe-fixes`

* ci: add unfixable `RUF` rules to `ignore`

[RUF002](https://docs.astral.sh/ruff/rules/ambiguous-unicode-character-docstring/) may be avoidable during schema generation, so I'm not doing any manual fixes right now.

* refactor: apply new `ruff` rules, fix and reformat

* test: Skip tests on Win that require a tz database

See apache/arrow#36996

* ci: enable `tool.ruff.lint.preview`

Testing the effect this has with existing rules, prior to adding `pylint`, `refurb`

* fix: replace [F841](https://docs.astral.sh/ruff/rules/unused-variable/) violations with dummy variable

* ci: add additional `preview` fixes to `extend-safe-fixes`

* refactor: apply `preview` fixes for existing `ruff` rules, reformat

* ci: add `preview` category `FURB` rules

Almost all have autofixes, splitting out from [pylint](https://docs.astral.sh/ruff/rules/#pylint-pl) which is much larger in scope + fewer fixes

* refactor: apply `FURB` rule fixes, manually fix `FURB101/3`

* fix: Revert newer fstring syntax, not available to `sphinx`

* ci: add fixable `pylint` rules

Across the 4 `PL.` categories, there are over 50 rules, with a mix of fixable, preview.
Tried to be picky with what is added, because adding even 1 of the groups would generate a lot of manual fixes.

* ci: add `pylint` fixes to `extend-safe-fixes`

* refactor: apply `pylint` rule fixes, add an inline optimization for `_selection`

`param_kwds` in `_selection` triggered `PLR6201`. Instead rewrote as a dictcomp.

* fix: Recover comments lost during linting

#3431 (comment)

* fix: Replace sources of `RUF002` violations

* fix: manual fix `RUF002` in `api`

* ci: add `PTH` rules

`flake8-use-pathlib` rules all require manual fixes.
> Found 70 errors.

<details>
<summary>Details</summary>

```md
20      PTH118  `os.path.join()` should be replaced by `Path` with `/` operator
12      PTH120  `os.path.dirname()` should be replaced by `Path.parent`
11      PTH100  `os.path.abspath()` should be replaced by `Path.resolve()`
11      PTH123  `open()` should be replaced by `Path.open()`
 7      PTH110  `os.path.exists()` should be replaced by `Path.exists()`
 6      PTH107  `os.remove()` should be replaced by `Path.unlink()`
 3      PTH103  `os.makedirs()` should be replaced by `Path.mkdir(parents=True)`
```

* refactor: Manually fix `PTH` rule violations

* fix: Use safer `inspect.getattr_static` in `update_init_file`

Avoids [triggering](https://docs.python.org/3/library/inspect.html#inspect.getattr_static) on objects like `datum`, `expr`

* fix: Use correct f-string `!s` modifier

* fix: Resolve `doc:build-html` error

```log
Extension error (sphinxext.altairgallery): Handler <function main at 0x7f0873471ab0> for event 'builder-inited' threw an exception (exception: unsupported operand type(s) for +: 'PosixPath' and 'str')
```

* fix: Resolve utf-8 error for [emoji example](https://altair-viz.github.io/gallery/isotype_emoji.html)

* Update sphinxext/altairgallery.py

Co-authored-by: Stefan Binder <[email protected]>

* build: bump `docbuild.yml` python `3.10` -> `3.12`

See @binste [comment](#3431 (comment))

* style: Simplify `PluginRegistry` repr

#3431 (comment)

* revert: ignore `suppressible-exception` rule in `pyproject.toml`

#3431 (comment)

* refactor: remove need for exception handling in `utils._vegafusion_data.get_inline_tables`

I'm happy to revert back to the original `try-catch`, but think it was less both less efficient and understandable.

If I've understood correctly:
- Every time this was called, all tables would be iterated over
- On the first call, tables processed by the `vegafusion_data_transformer` would popped and returned
- On all calls, other tables would be ignored but always trigger exceptions
- On any call other than the first, every table triggers an exception

This behaviour is easily reduced using [set methods](https://docs.python.org/3/library/stdtypes.html#set) directly, as an empty `set` would produce an empty `dict` during the `dictcomp`.

#3431 (comment)

* refactor: remove `python<3.3` compat code in `utils.core.use_signature`

Was replaced with `contextlib.suppress` in a3fd77a but targets long-unsupported python versions.

For future reference, there are also related `stdlib` functions that may have not been available when this was written:
- [inspect.getdoc](https://docs.python.org/3/library/inspect.html#inspect.getdoc)
- [functools.wraps](https://docs.python.org/3/library/functools.html#functools.wraps)

Wouldn't cover everything there, but may replace the need for some parts.

#3431 (comment)

* revert: replace `PLW1514` fixes with "utf-8"

The rule remains in `pyproject.toml`, but is no longer autofixed with undesirable `locale.getprefferedencoding(False)`.

#3431 (comment)

* refactor: minor simplify and sort `__all__` from `generate_schema_wrapper`

* docs: `FA100` on most of `tools/`*`

The start of a cautious introduction of `from __future__ import annotations`.
Excludes `schemaapi.schemaapi.py` for now, but I think that would be safe as well.

* docs: `FA100` on `sphinxext/*`

* docs: `FA100` on `alt.jupyter`

* docs: `FA100` on `alt.expr`

* docs: `FA100` on `alt.vegalite`, excluding `v5.api`, `v5.schema/*`

- Covers non-autogenerated code.
- `api` is being special-cased for now.
  - Will be a great candidate for these fixes, but there are many simpler modules in `alt.utils` that I'm priotizing in case they highlight issues

* docs: `FA100` on private `alt.utils` modules

* docs(typing): Add annotations for `sphinxext`

- Making use of `FA100` enabled syntax
- Eventually would like to improve the performance of this, since `sphinxext` is by far the longest build step
- Towards that end, the annotations give a contextual starting point

* docs: `FA100` on public `alt.utils` modules, excluding `schemapi`

- Also did made some improvements to annotations more broadly, now that the newer syntax is allowed

* ci: update `pyproject.toml`, add new granular `tool.ruff.lint.per-file-ignores` setting

- So far been making the changes one file at a time, followed by tests
- Not encountered any runtime issues
- `mypy` has complained occasionally, but has been resolvable

* revert: Change `write_file_or_filename` default `encoding` to `None`

* docs: `FA100` on `tools.schemapi.schemapi`

Also adds some annotations that are now possible without circular import concerns

* feat(typing): add `from __future__ import annotations` for each file in `generate_schema_wrapper`

Also:
- ensure `annotations` excluded from `__all__`
- minor standardizing import order
- adjust `ruff` rules to correct schemagen

* ci(typing): adds `generate-schema-wrapper` script to `hatch`

Rewrites and reformats all annotations in `altair/vegalite/v?/schema`
- Meaning that the original code stays the same:

```
>>> ruff check altair/vegalite/v?/schema --fix --exit-zero
Found 13006 errors (13006 fixed, 0 remaining).
>>> ruff format altair/vegalite/v?/schema
4 files reformatted
>>> ruff check altair/vegalite/v?/schema --statistics
```

* refactor(typing): Improve annotations and narrowing in `schemapi`

- Changes are mainly on `_FromDict.from_dict`
  - Avoids type errors due to multiple reassignments
  - Adds a range of overloads that preserve the `SchemaBase` subclass
    - Not possible in all cases
  - replacing `cls` with `tp` - when used in an instance method
    - The former is quite strange to see outside of a `@classmethod`
    - Especially in a class that has a `@classmethod`
- updated `_default_wrapper_classes` to match `_subclasses` return type in `generate_schema_wrapper`

* build: run `generate-schema-wrapper`

The cumulative effect of `FA100`, `UP`, `TCH` rules applied to `v5.schema`.

All tests pass!

* revert(typing): Roll back runtime evaluated types not possible on older python

`from __future__ import annotations` applies in annotation scope, but not in aliases. Until `altair` requires `python>=3.9-10`, some syntax will need to differ - like many of the changes seen in this commit.

# Note
The current `ruff` config will only upgrade syntax in annotation scope, until `tool.ruff.target-version` "unlocks" new features. These were changes I made manually, not having tested against all versions - which are being reverted.

* fix(typing): Adds overloads and reduce ignore comments relating to `spec_to_mimebundle`

The repetition in `utils.save` is unfortunate, but is required to satisfy mypy. The issues are from overloading `spec_to_mimebundle` with `tuple/dict` as each has different indexing constraints.

A longer term solution would be to adding an intermediate function to return the nested value. Due to `spec_to_mimebundle`'s use elsewhere operating directly on the full output.

* revert(typing): Roll back additional runtime evaluated types for `python=3.8`

* fix(typing): Use `...` for `DataTransformerType`

Original `pass` keyword was equivalent to a `None` return

* docs(typing): `FA100` on `alt.vegalite.v5.api` and manual annotation rewrites

### `api`
A unique, but significant, difference to the other commits is the large number of module prefixed annotations that can be shortened.
`TopLevelMixin.project` shows the greatest reduction in combined characters for annotations, but there are small improvements throughout.

I've made all of these `TYPE_CHECKING`-only imports explicit, to highlight a new path forward this opens up. There are many repeat/similar `Union` types, which can now be declared in another module - without concern for circular import issues.

### `v5.__init__`
These changes have also removed the need for both the import of `expr` in `api`, and the `# type: ignore[no-redef]` in `v5.__init__`.

* test: add pytest rules `PT` and fix `PT001` violations

https://docs.astral.sh/ruff/rules/pytest-fixture-incorrect-parentheses-style/

* test: add ignores for `PT011` on existing tests

Probably useful to enforce on new code, https://docs.astral.sh/ruff/rules/pytest-raises-too-broad/

* test: fix `PT018` violation

https://docs.astral.sh/ruff/rules/pytest-composite-assertion/

* test: fix `PT006` violations

https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/

* test: add ignore for `PT012` violation

Not sure how this could be rewritten. https://docs.astral.sh/ruff/rules/pytest-raises-with-multiple-statements/

* test: add `pytest.mark.xfail` for flaky `scatter_with_layered_histogram` test

Struggling to find the source of the failure, the `mark` is my best guess but currently can't reproduce the following error:

```
___________________________________________ test_compound_chart_examples[False-scatter_with_layered_histogram.py-all_rows18-all_cols18] ___________________________________________
[gw2] win32 -- Python 3.8.19 C:\Users\*\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\Scripts\python.exe

filename = 'scatter_with_layered_histogram.py', all_rows = [2, 17], all_cols = [['gender'], ['__count']], to_reconstruct = False

    @pytest.mark.skipif(vf is None, reason="vegafusion not installed")
    # fmt: off
    @pytest.mark.parametrize("filename,all_rows,all_cols", [
        ("errorbars_with_std.py", [10, 10], [["upper_yield"], ["extent_yield"]]),
        ("candlestick_chart.py", [44, 44], [["low"], ["close"]]),
        ("co2_concentration.py", [713, 7, 7], [["first_date"], ["scaled_date"], ["end"]]),
        ("falkensee.py", [2, 38, 38], [["event"], ["population"], ["population"]]),
        ("heat_lane.py", [10, 10], [["bin_count_start"], ["y2"]]),
        ("histogram_responsive.py", [20, 20], [["__count"], ["__count"]]),
        ("histogram_with_a_global_mean_overlay.py", [9, 1], [["__count"], ["mean_IMDB_Rating"]]),
        ("horizon_graph.py", [20, 20], [["x"], ["ny"]]),
        ("interactive_cross_highlight.py", [64, 64, 13], [["__count"], ["__count"], ["Major_Genre"]]),
        ("interval_selection.py", [123, 123], [["price_start"], ["date"]]),
        ("layered_chart_with_dual_axis.py", [12, 12], [["month_date"], ["average_precipitation"]]),
        ("layered_heatmap_text.py", [9, 9], [["Cylinders"], ["mean_horsepower"]]),
        ("multiline_highlight.py", [560, 560], [["price"], ["date"]]),
        ("multiline_tooltip.py", [300, 300, 300, 0, 300], [["x"], ["y"], ["y"], ["x"], ["x"]]),
        ("pie_chart_with_labels.py", [6, 6], [["category"], ["value"]]),
        ("radial_chart.py", [6, 6], [["values"], ["values_start"]]),
        ("scatter_linked_table.py", [392, 14, 14, 14], [["Year"], ["Year"], ["Year"], ["Year"]]),
        ("scatter_marginal_hist.py", [34, 150, 27], [["__count"], ["species"], ["__count"]]),
        ("scatter_with_layered_histogram.py", [2, 17], [["gender"], ["__count"]]),
        ("scatter_with_minimap.py", [1461, 1461], [["date"], ["date"]]),
        ("scatter_with_rolling_mean.py", [1461, 1461], [["date"], ["rolling_mean"]]),
        ("seattle_weather_interactive.py", [1461, 5], [["date"], ["__count"]]),
        ("select_detail.py", [20, 1000], [["id"], ["x"]]),
        ("simple_scatter_with_errorbars.py", [5, 5], [["x"], ["upper_ymin"]]),
        ("stacked_bar_chart_with_text.py", [60, 60], [["site"], ["site"]]),
        ("us_employment.py", [120, 1, 2], [["month"], ["president"], ["president"]]),
        ("us_population_pyramid_over_time.py", [19, 38, 19], [["gender"], ["year"], ["gender"]]),
    ])
    # fmt: on
    @pytest.mark.parametrize("to_reconstruct", [True, False])
    def test_compound_chart_examples(filename, all_rows, all_cols, to_reconstruct):
        source = pkgutil.get_data(examples_methods_syntax.__name__, filename)
        chart = eval_block(source)
        if to_reconstruct:
            # When reconstructing a Chart, Altair uses different classes
            # then what might have been originally used. See
            # vega/vegafusion#354 for more info.
            chart = alt.Chart.from_dict(chart.to_dict())
        dfs = chart.transformed_data()

        if not to_reconstruct:
            # Only run assert statements if the chart is not reconstructed. Reason
            # is that for some charts, the original chart contained duplicated datasets
            # which disappear when reconstructing the chart.
            assert len(dfs) == len(all_rows)
            for df, rows, cols in zip(dfs, all_rows, all_cols):
>               assert len(df) == rows
E               assert 19 == 17
E                +  where 19 = len(    bin_step_5_age  bin_step_5_age_end gender  __count\n0             45.0                50.0      M      247\n1       ...       11\n17
   30.0                35.0      F        5\n18            70.0                75.0      F        1)

tests\test_transformed_data.py:132: AssertionError
```

* ci: tidy up `ruff` section of `pyproject.toml`

- removed `SIM` rules from `extend-safe-fixes`, only 1 required it and manual fixes should be fine for new code
- Provided links for new config groups/settings

#3431 (comment)

* perf: adds config `pyproject.toml` for faster build, test runs

- `pip` -> `uv`
- `pytest` single-core -> logical max

https://hatch.pypa.io/latest/how-to/environment/select-installer/#enabling-uv
https://pytest-xdist.readthedocs.io/en/latest/distribution.html#running-tests-across-multiple-cpus

Should speed up CI, although GitHub Actions only runs on 4 cores iirc

* ci: adds `update-init-file` hatch script

* ci: adds newer-style `hatch` test config

I've attempted to recreate the existing `test` scripts, but haven't quite worked out the `coverage` parts.

This provides: matrix python version testing, parallel tests (within a python version), and slightly shorter commands:
```bash
>>> hatch run test
>>> hatch test

>>> hatch run test-coverage
>>> hatch test --cover
```

* feat(typing): Use `TYPE_CHECKING` block for `schema` modules

- Removes the need for `_Parameter` protocol
- Reduce lines of code by more than 5000, by removing prefix for `SchemaBase`, `Parameter`

* test: use a more reliable `xfail` condition for flaky test

The difficulty in reproducing came from the python debugger disabling `xdist`.

* build: Embed extra `ruff` calls in `generate-schema-wrapper` into the python script

Hopefully solves: https://github.com/vega/altair/actions/runs/9456437889/job/26048327764?pr=3431

* build: Manually rewrite certain exceptions with flaky autofix

These were previously fixed by `EM` rules, but seem to not always occur - leading to CI fail

https://github.com/vega/altair/actions/runs/9466550650/job/26078482982#step:8:60

* refactor: remove now-unneeded `PARAMETER_PROTOCOL`

* refactor: replace existing references to `typing.Optional`

* refactor(typing): rename `T` TypeVar to `TSchemaBase`

* feat(typing): Adds dedicated `Optional` alias for `Union[..., UndefinedType]`

`typing.Optional` is deprecated and is no longer needed in `altair`. The new alias improves the readability of generated signatures, by reducing noise (repeating `UndefinedType`) and clearly grouping together the options for an argument.

* refactor(typing): Remove `UndefinedType` dependency in `api`

* refactor(typing): Remove `UndefinedType` dependency in `api`

* refactor(typing): Remove annotation scope `UndefinedType` dependency in `api`

Aligns `api` with `schema` changes in fbc02e2

* ci: add `W291` to ensure trailing whitespace is autofixed

* refactor: define non-relevant attributes closer to imports in `update_init_file`

Will make it easier to keep the two in sync, and now they can be copy/pasted - rather than adding an `or attr is _` expression

* feat(typing): Adds `_TypeAliasTracer` and reorders `generate_schema_wrapper` to collect unique `Literal` types

The next commit will show the changes from build

* build: run `generate-schema-wrapper` using `_TypeAliasTracer`

This commit reduces the length of each module in `schema` by ~50-70% and provides the each alias in the new `_typing` module.

## Existing art
[`pandas`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L199) and [`polars`](https://github.com/pola-rs/polars/blob/faf0f061eead985a8b29e4584b619fb5d8647e31/py-polars/polars/dataframe/frame.py#L105) similarly define a large number of `TypeAlias`, `Union`, `Literal` types in a single file. These definitions can be imported anywhere in the project - without concern for circular dependencies. It helps keep signatures consistent, and in many cases, simplifies those signatures such that the user can quickly understand what they represent.

## Future config
The generated names are currently:
```py
f"{SchemaInfo.title}_T"
```
so as to not conflict with the names defined in `core`/`channels`.
I have left this configurable via:
```py
_TypeAliasTracer(fmt=...)
```

Personally don't mind what the aliases are called, so if anyone has any ideas please feel free to change.

* fix: Add missing `LiteralString` import

* test: add `pytest.mark.filterwarnings` for tests that cannot avoid them

- `pandas` warnings cannot trivially be avoided, as they need to be compatible with `pandas~=0.25` and are used in examples.
- I have been unable to track down what specific calls are triggering the `traitlets` warnings. There seems to be multiple levels of indirection, but the stacklevel reported is external to `altair`.

* refactor(typing): Replace `Literal[None]` -> `None`

Came across [Literal[None] conversion](https://typing.readthedocs.io/en/latest/spec/literal.html#legal-parameters-for-literal-at-type-check-time) during #3426 (comment)

* test: Change `skipif` -> `xfail` for `test_sanitize_pyarrow_table_columns` on `win32`

Seems to be a better fit, considering this would fail for `win32` devs not using `pyarrow` but have it installed.
This could be removed in the future if installing/confirming install `tzdata` to the correct location was handled during environment activation.

Possibly resolves: #3431 (comment)

* ci: bump `actions/setup-python`, `python-version`, use `uv` in `lint.yml`

I've made quite a few improvements to performance locally like fd20f00, but only `docbuild.yml` uses `hatch` so they haven't reduced CI run time.

This is more of a test run on the simplest workflow to identify any issues.

* ci: create venv before uv pip install

* ci: ensure venv is activated after install

* ci: use environment provided by `hatch`

* ci: testing `pytest-xdist` in `build` workflow

Would be faster if using `uv`/`hatch` instead of `pip` - but I'm not sure how to untangle `pip` from this yet.

* test(perf): adds significant parallelism for slow `test_examples`

#3431 (comment)

* refactor, perf: reduce `sys.path` usage, use dictcomp in `update_init_file`

# `sys.path`
Mainly modules in `tools`, but also `tests`, `sphinxext` repeat the same pattern of manipulating `sys.path` to find imports.
In some cases, I think this wasn't need at all - but others were resolves by adding `tools.__init__.py`.

- Aligns with many of the other sub/packages
- Removes need for ignoring `E402`
- Makes many of the imports in these modules explicit, and therefore easier to follow.

#  `update_init_file`
While rewriting the imports here, I saw an opportunity to simplify gathering relevant attributes.

- `attr = getattr(alt, attr_name)` using the keys of `__dict__`, was a slow equivalent of a dictionary comprehension.
- `str(Path(alt.__file__).parent)` is a verbose `str(Path.cwd())`
  - An earlier commit of mine used this instead of `os.path`
  - This version is much shorter and obvious in what it does

* build: adding debug message to help with build failure

The previous run worked locally due to `hatch` managing `sys.path` - but not in CI, which doesn't use hatch yet

 https://github.com/vega/altair/actions/runs/9527747043/job/26264700374

* build: add cwd to debug message

* fix: possibly fix cwd not on path

* ci: testing invoking script with `-m` for cwd

* fix: remove debug code

* fix: resolve lint, format, type conflicts following rebase

Fixes https://github.com/vega/altair/actions/runs/9581196541/job/26417448238?pr=3431

* DO NOT MERGE - TESTING DOC PERF

* revert: undo last commit

Intended to discard the changes, but accidentally pushed

* refactor: Remove commented out dead code

Fixes #3431 (comment)

* ci: Remove extra whitespace in `build.yml`

Fixes #3431 (comment)

---------

Co-authored-by: Stefan Binder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants