Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 error with read_zarr_array #13

Closed
jshannon75 opened this issue Dec 5, 2024 · 2 comments
Closed

404 error with read_zarr_array #13

jshannon75 opened this issue Dec 5, 2024 · 2 comments

Comments

@jshannon75
Copy link

jshannon75 commented Dec 5, 2024

I'm trying to read in ERA5 data from Google's Zarr archive: https://cloud.google.com/storage/docs/public-datasets/era5

Here's the code I've set up:
`library(Rarr)

s3_address<-"https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr"

zarr_overview(s3_address)

read_zarr_array(s3_address,index=list(1:10,NULL,1))`

zarr_overview works, but the read_zarr_array gives me this error: Error: NoSuchKey (HTTP 404). The specified key does not exist.

This is on a Windows machine if that matters.

@grimbough
Copy link
Owner

Hi, thanks for the intrest in using Rarr.

You're getting this error because the zarr found at that location actually contains a large number of arrays. Rarr currently only reads single arrays. zarr_overview() shows you all the available arrays, but read_zarr_array() doesn't have a mechanism for accessing them. You have to provide the URL to the individual array you want to access.

For example, if we want to access the cape array you would use this address:

cape_data_s3_address <- "https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/cape"

zarr_overview(cape_data_s3_address)
#> Type: Array
#> Path: https://storage.googleapis.com/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/cape/
#> Shape: 374016 x 542080
#> Chunk Shape: 1 x 542080
#> No. of Chunks: 374016 (374016 x 1)
#> Data Type: float32
#> Endianness: little
#> Compressor: blosc

Then we can read it with read_zarr_array(). Not your index here should be of length 2, since the array has 2 dimensions.

read_zarr_array(cape_data_s3_address,index=list(1:10,1))
#>        [,1]
#>  [1,] 0.500
#>  [2,] 0.125
#>  [3,] 0.125
#>  [4,] 0.125
#>  [5,] 0.250
#>  [6,] 0.500
#>  [7,] 0.625
#>  [8,] 0.750
#>  [9,] 0.625
#> [10,] 0.750

I will look at implementing a mechanism for read_zarr_array() to be aware of this type of structure, and maybe add a path argument or similar to access a specific array within the larger group. I think that's essentially what's being discussed in #12

@jshannon75
Copy link
Author

Thanks--this is very much out of my wheelhouse datawise, so part of it is getting familiar with how these data get served up. This appears to be working, so I can close out the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants