Skip to content

Commit

Permalink
Clarify that virtual datasets are not normal xarray datasets (#173)
Browse files Browse the repository at this point in the history
* add admonition

* release notes
  • Loading branch information
TomNicholas committed Jul 2, 2024
1 parent 08ca0cd commit a0451b9
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
3 changes: 3 additions & 0 deletions docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ Bug fixes
Documentation
~~~~~~~~~~~~~

- Clarify that virtual datasets cannot be treated like normal xarray datasets. (:issue:`173`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.


Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
14 changes: 12 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ In future we would like for it to be possible to just use `xr.open_dataset`, e.g
but this requires some [upstream changes](https://github.com/TomNicholas/VirtualiZarr/issues/35) in xarray.
```

Printing this "virtual dataset" shows that although it is an instance of `xarray.Dataset`, unlike a typical xarray dataset, it does not contain numpy or dask arrays, but instead it wraps {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects. (We will use the term "virtual dataset" to refer to any `xarray.Dataset` which contains one or more {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects.)
Printing this "virtual dataset" shows that although it is an instance of `xarray.Dataset`, unlike a typical xarray dataset, it does not contain numpy or dask arrays, but instead it wraps {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects.

```python
vds
Expand All @@ -60,7 +60,17 @@ Attributes:
title: 4x daily NMC reanalysis (1948)
```

These {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects are each a virtual reference to some data in the `air.nc` netCDF file, with the references stored in the form of "Chunk Manifests".

Generally a "virtual dataset" is any `xarray.Dataset` which wraps one or more {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects.

These particular {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects are each a virtual reference to some data in the `air.nc` netCDF file, with the references stored in the form of "Chunk Manifests".

```{important} Virtual datasets are not normal xarray datasets!
Although the top-level type is still `xarray.Dataset`, they are intended only as an abstract representation of a set of data files, not as something you can do analysis with. If you try to load, view, or plot any data you will get a `NotImplementedError`. Virtual datasets only support a very limited subset of normal xarray operations, particularly functions and methods for concatenating, merging and extracting variables, as well as operations for renaming dimensions and variables.
_The only use case for a virtual dataset is [combining references](#combining-virtual-datasets) to files before [writing out those references to disk](#writing-virtual-stores-to-disk)._
```

### Opening remote files

Expand Down

0 comments on commit a0451b9

Please sign in to comment.