Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEXRAD issue - IndexError index out of bounds #207

Open
rabernat opened this issue Sep 9, 2024 · 4 comments
Open

NEXRAD issue - IndexError index out of bounds #207

rabernat opened this issue Sep 9, 2024 · 4 comments
Labels
bug Something isn't working data-standards Data standard related

Comments

@rabernat
Copy link

rabernat commented Sep 9, 2024

  • xradar version: 0.6.4
  • xarray version: 2024.7.0
  • Python version: 3.12.4
  • Operating System: Linux

Description

I have found a puzzling bug that only comes up in certain situations with Dask

What I Did

import xradar
import xarray as xr
import pooch

# download and open a NEXRAD2 file from S3
url = "https://noaa-nexrad-level2.s3.amazonaws.com/2024/09/01/FOP1/FOP120240901_000347_V06"
local_file = pooch.retrieve(url, known_hash=None)
ds = xr.open_dataset(local_file, group="sweep_0", engine="nexradlevel2")

# create a chunked version 
dsc = ds.chunk()
# load one variable - IMPORTANT - skipping this step makes the next line work
dsc.DBZH.load()
# load the entire dataset
dsc.load()
# - > IndexError: index 140 is out of bounds for axis 0 with size 38
# try all the variables
for v in dsc:
    print(v)
    try:
        dsc[v].load()  # also fails with dsc!
        print("ok")
    except Exception as e:
        print(e)
# DBZH
# ok
# ZDR
# index 212 is out of bounds for axis 0 with size 212
# PHIDP
# index 140 is out of bounds for axis 0 with size 130
# RHOHV
# index 140 is out of bounds for axis 0 with size 22
# CCORH
# index 140 is out of bounds for axis 0 with size 73
# sweep_mode
# ok
# sweep_number
# ok
# prt_mode
# ok
# follow_mode
# ok
# sweep_fixed_angle
# ok

Possibly related to #180.

Experience tells me this has something to do with Dask task tokenization.

@syedhamidali
Copy link
Contributor

@rabernat Thanks for sharing this issue.

The IndexError seems related to variables with inconsistent dimensions. Some variables (e.g., sweep_mode, sweep_number) are scalars, while others (e.g., DBZH, ZDR) are multi-dimensional, which could be causing the issue with Dask chunking.

To focus on the multi-dimensional variables, you can try:

import xradar
import xarray as xr
import pooch

# download and open a NEXRAD2 file from S3
url = "https://noaa-nexrad-level2.s3.amazonaws.com/2024/09/01/FOP1/FOP120240901_000347_V06"
local_file = pooch.retrieve(url, known_hash=None)
ds = xr.open_dataset(local_file, group="sweep_0", engine="nexradlevel2")

# create a chunked version 
dsc = ds.chunk()
for var in dsc.data_vars:
    if len(dsc[var].dims) > 1:
        print(var)
        display(dsc[var].load())

@syedhamidali syedhamidali added bug Something isn't working data-standards Data standard related good first issue Good for newcomers labels Sep 10, 2024
@rabernat
Copy link
Author

rabernat commented Sep 11, 2024

@syedhamidali - I'm not sure I understand your reponse.

Loading this dataset works fine without Dask. When dask comes into the picture, we get an error. This seems like a bug in xradar. The workaround you proposed does not address the root cause.

@kmuehlbauer
Copy link
Collaborator

Thanks for the detailed report @rabernat. I've reopened #180 as it wasn't fully resolved.

A deeper look will take some time. We will definitely look into this after ERAD 2024 where the majority of the xradar devs are currently.

Side note: @rabernat You might be interested in the short course we gave last Sunday where we acknowledged the great work of pangeo and project pythia.

Thanks also to @syedhamidali for taking care here.

@syedhamidali
Copy link
Contributor

@kmuehlbauer I wanted to mention that I ran the same code with other file types (Cfradial, Iris...), and they all experienced the same issue with Dask chunking.

@kmuehlbauer kmuehlbauer removed the good first issue Good for newcomers label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data-standards Data standard related
Development

No branches or pull requests

3 participants