Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Slightly more flexible API for add_missing_bounds #666

Open
pochedls opened this issue Jun 19, 2024 · 6 comments
Open

[Enhancement]: Slightly more flexible API for add_missing_bounds #666

pochedls opened this issue Jun 19, 2024 · 6 comments
Labels
good-first-issue Good first issue for new contributors type: enhancement New enhancement request

Comments

@pochedls
Copy link
Collaborator

Is your feature request related to a problem?

This is a convenience request. I almost always need to add missing bounds for one axis, but the (updated?) API wants a list.

Describe the solution you'd like

I'd like the API to accept a list or string (e.g., ["X", "Y"] or simply T).

Describe alternatives you've considered

No response

Additional context

No response

@pochedls pochedls added type: enhancement New enhancement request good-first-issue Good first issue for new contributors labels Jun 19, 2024
@lee1043
Copy link
Collaborator

lee1043 commented Jun 19, 2024

I like this idea!

@tomvothecoder
Copy link
Collaborator

Sounds good to me. These APIs also need to be updated:

  • xcdat/xcdat/dataset.py

    Lines 44 to 47 in c653bf2

    def open_dataset(
    path: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
    data_var: Optional[str] = None,
    add_bounds: List[CFAxisKey] | None = ["X", "Y"],
  • xcdat/xcdat/dataset.py

    Lines 127 to 130 in c653bf2

    def open_mfdataset(
    paths: str | NestedSequence[str | os.PathLike],
    data_var: Optional[str] = None,
    add_bounds: List[CFAxisKey] | None = ["X", "Y"],
  • xcdat/xcdat/dataset.py

    Lines 463 to 467 in c653bf2

    def _postprocess_dataset(
    dataset: xr.Dataset,
    data_var: Optional[str] = None,
    center_times: bool = False,
    add_bounds: List[CFAxisKey] | None | bool = ["X", "Y"],

@pochedls
Copy link
Collaborator Author

Hi @tomvothecoder – I'm not opposed to updating the API in these places, but does it need to be updated there?

When opening a dataset it kind of makes sense to pass in a list for most use cases (make sure this has time/lat/lon bounds no matter what...). A lot of times when you add_missing_bounds it is because you got an error and now need to fix it by adding bounds for a specific axis.

@tomvothecoder
Copy link
Collaborator

When opening a dataset it kind of makes sense to pass in a list for most use cases (make sure this has time/lat/lon bounds no matter what...). A lot of times when you add_missing_bounds it is because you got an error and now need to fix it by adding bounds for a specific axis.

That's true. I was thinking of consistency across APIs but the use cases aren't exactly the same. We can keep those APIs as is.

@tomvothecoder tomvothecoder added this to the FY24 Items for Dev Day milestone Jun 20, 2024
@pochedls
Copy link
Collaborator Author

pochedls commented Jul 9, 2024

This is currently working on 0.7.1 (even though the function signature documents an argument of form axes : List[str]). I'm pretty sure this did not work in the past, but it is working for me now (I should have put in a minimal example...). I think this is because if I iterate over ['X', 'Y'] or X both return a string ("X" and "Y" versus "X", respectively). We can re-open this if stops working.

An example (that works):
wget https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v6/access/gridded/NOAAGlobalTemp_v6.0.0_gridded_s185001_e202405_c20240608T152402.nc (or a recent dataset here)

import xcdat as xc
fn = '/p/user_pub/climate_work/pochedley1/surface/NOAAGlobalTemp_v6.0.0_gridded_s185001_e202405_c20240608T152402.nc'
ds = xc.open_dataset(fn)
ds.bounds.get_bounds('T')

KeyError: "No bounds data variables were found for the 'T' axis. Make sure the dataset has bound data vars and their names match the 'bounds' attributes found on their related time coordinate variables. Alternatively, you can add bounds with ds.bounds.add_missing_bounds() or ds.bounds.add_bounds()."

ds = ds.bounds.add_missing_bounds('T')
ds.bounds.get_bounds('T')

<xarray.DataArray 'time_bnds' (time: 2093, bnds: 2)> Size: 33kB
array([[cftime.DatetimeGregorian(1850, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1850, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
...
[cftime.DatetimeGregorian(2024, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2024, 6, 1, 0, 0, 0, 0, has_year_zero=False)]],
dtype=object)
Coordinates:

  • time (time) object 17kB 1850-01-01 00:00:00 ... 2024-05-01 00:00:00
    Dimensions without coordinates: bnds
    Attributes:
    xcdat_bounds: True

@pochedls pochedls closed this as completed Jul 9, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in xCDAT Development Jul 9, 2024
@tomvothecoder
Copy link
Collaborator

This is currently working on 0.7.1 (even though the function signature documents an argument of form axes : List[str]). I'm pretty sure this did not work in the past, but it is working for me now (I should have put in a minimal example...). I think this is because if I iterate over ['X', 'Y'] or X both return a string ("X" and "Y" versus "X", respectively). We can re-open this if stops working.

The for loop below is supposed to loop over a list of strings. However, if we pass a single string, it'll loop over each character of the string. In your case you pass a single character string (e.g., "T") which means the for-loop behaves correctly (unintentionally).

xcdat/xcdat/bounds.py

Lines 166 to 170 in 7751452

for axis in axes:
try:
coords = get_dim_coords(ds, axis)
except KeyError:
continue

We should do the following:

  1. Update the docstring with axes: List[CFAxisKey] | CFAxisKEy
  2. Add a check to make sure the single character is a CFAxisKey and raise a clear ValueError if it isn't

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue Good first issue for new contributors type: enhancement New enhancement request
Projects
Status: Todo
Development

No branches or pull requests

3 participants