Change units caravan forcing, added support for data_sources, changed version #433

RolfHut · 2024-06-18T17:53:56Z

changed the units to be consistent with lumpedmakking
added support for the new 'data_source' option. Reverts to ERA5_land (Standard for caravan) if not specified.
changed hardcoded version in URL to global and bumped to v2

RolfHut · 2024-06-20T07:02:54Z

I'm running into some naming problems: ERA5 variables are re-named to standards, but those variables already exist in the other datasets.

BSchilperoort · 2024-06-20T07:23:58Z

I'm running into some naming problems: ERA5 variables are re-named to standards, but those variables already exist in the other datasets.

Would you not first apply the selection of which dataset to use, therefore avoid any variable duplicates?

RolfHut · 2024-06-20T12:49:01Z

Tried that: doesn't work. Empty parameter remains part of the xarray object. I'm working on it.

Daafip · 2024-07-02T13:05:10Z

Not really related to units, but would be nice to squeeze in here.
If you pass a 'bad' basin_id, e.g. '01439500' instead of 'camels_01439500' you get a long error, unrelated to your issue:

OSError: [Errno -90] NetCDF: file not found: 'https://opendap.4tu.nl/thredds/dodsC/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/1/01439500.nc'

maybe catch this first, then actually try retrieve the dataset

BSchilperoort · 2024-07-08T12:43:04Z

I fixed some of the issues you might have been encountering. Seemed to work for era5_land now, not yet daymet.

I did encounter a problem with the dataset:

There seem to be a lot of duplicate entries for basins. The dimension size for Camels USA's basin_id is 1153, but it only has 671 unique values.

Daafip · 2024-07-09T15:33:54Z

I thought I checked that, but in that case I made some mistakes with merging. Looks like it was appended rather than merged...

The code I used is here, could you take a look if you spot an error Bart? I still have the (unmerged) files locally so if it's a quick fix I can run it again.

Daafip · 2024-07-10T07:41:05Z

Oh, I think I see the issue.
When combining the three data sources I used ds = xr.concat(file_lst,dim=('data_source')),which did work:

Dimensions:               (basin_id: 671, time: 12784, data_source: 3)
Coordinates:
  * basin_id              (basin_id) |S15 b'camels_01013500' ... b'camels_144...
  * time                  (time) datetime64[ns] 1980-01-01 ... 2014-12-31
  * data_source           (data_source) |S64 b'nldas' b'maurer' b'daymet'

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?). I should've checked this before uploading, sorry.

Dimensions:                             (time: 14975, basin_id: 1153,
                                         data_source: 4)

I probably need to use merge in this case?

Can give that a try sometime next week at the earliest.

BSchilperoort · 2024-07-11T13:52:02Z

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?).

Yeah if the dimension size is not the same it's more challenging to merge the data. Perhaps the first step would be to make the basin count + labels the same for the new data, before merging the new variables into the original data.

RolfHut · 2024-07-12T07:07:55Z

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?).

Yeah if the dimension size is not the same it's more challenging to merge the data. Perhaps the first step would be to make the basin count + labels the same for the new data, before merging the new variables into the original data.

Hmm, I'm hesitant to change the original data: we are trying to give access to the original datasets. (not helping, just giving the broader context)

Daafip · 2024-07-15T09:22:05Z

Yeah if the dimension size is not the same it's more challenging to merge the data.

I found that the first entry gives era5 and the last the other three (as it appended). We can use it this to ductape it together (option 1).

Neater would be to update the dataset.
We can then have two files: camels_caravan.nc and camels_original.nc.
If you specify nothing or era5 you just use the caravan dataset as is (camels_caravan.nc). If you want daymet, Maurer or NLDAS, then you access the camels_original.nc file which I created from camels txt data.

@RolfHut @BSchilperoort what are your views on this? Im happy to do either whilst I still have a TUD account to update the dataset easily (not sure what happens after).

RolfHut · 2024-08-02T10:36:04Z

I prefer the option with the different files based on which dataset you request. This does mean a new upload indeed.

I have just fixed the naming issue: generating now works, but you do still have the problem with the double datasets. This also makes 'to_xarray()' fail. If you can give a headsup when the source data is fixed, I can see if 'to_xarray' works again and move on with the climate change analyses.

BSchilperoort · 2024-09-16T13:11:56Z

@RolfHut I was looking into this again. Modifying the original datasets was too much work/data for me at the moment (encountered some issues with Caravan too). Writing code to work around it is a lot easier now.

482/1153 camels (USA) basins are missing the daymet data (not sure why: it's just missing 🤷‍♂️ ). Otherwise massaging the data into a useful format goes OK now, opening a PR soon.

Daafip · 2024-11-14T18:46:08Z

482/1153 camels (USA) basins are missing the daymet data

So ive managed to find some time to dig into the issue. There is 671 in the original camels USA dataset. This is used for my thesis work.

In version 1 of the opendap, I just used the camels as present in the caravan datset, which is a subset of the orignal data set at 482 basins. This difference was unkown to me at the time, the forcing of this is all era5.

I then attempted to add the original camels dataset which has 671 basin and 3 forms of forcing. 671 + 482 = 1153: I appended rather than merged the datasets ... I'll look into fixing this.

As a continuation of my thesis I aim to publish a paper on it, with some time provided by HKV and some in my own time (considering I still have a bit to do).

RolfHut added 4 commits May 29, 2024 07:25

change units of caravan forcing to match NETCDF CF and CMIP

214f9d9

unit change caravan forcing fixed

24587e4

oops, unit conversion is hard

d20359a

added support for data_source and version

83c6b95

RolfHut requested a review from BSchilperoort June 18, 2024 17:53

fixed basic errors, run into naming of variable error

7a009e4

still doesn't work

ce82b87

eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024

eWaterCycle deleted a comment from RolfHut Jul 8, 2024

BSchilperoort added 2 commits July 8, 2024 14:09

Merge branch 'main' into change-units-caravan-forcing

d8bf478

Fix (some) issues

a2c969f

fixed issue with different data sources having different variable names

bf1134a

BSchilperoort mentioned this pull request Sep 17, 2024

Support the newest OPeNDAP version of Caravan, patch data issues. #458

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change units caravan forcing, added support for data_sources, changed version #433

Change units caravan forcing, added support for data_sources, changed version #433

RolfHut commented Jun 18, 2024

RolfHut commented Jun 20, 2024

BSchilperoort commented Jun 20, 2024

RolfHut commented Jun 20, 2024 via email •

edited by BSchilperoort

Loading

Daafip commented Jul 2, 2024

BSchilperoort commented Jul 8, 2024

Daafip commented Jul 9, 2024

Daafip commented Jul 10, 2024

BSchilperoort commented Jul 11, 2024

RolfHut commented Jul 12, 2024

Daafip commented Jul 15, 2024

RolfHut commented Aug 2, 2024

BSchilperoort commented Sep 16, 2024

Daafip commented Nov 14, 2024

Change units caravan forcing, added support for data_sources, changed version #433

Are you sure you want to change the base?

Change units caravan forcing, added support for data_sources, changed version #433

Conversation

RolfHut commented Jun 18, 2024

RolfHut commented Jun 20, 2024

BSchilperoort commented Jun 20, 2024

RolfHut commented Jun 20, 2024 via email • edited by BSchilperoort Loading

Daafip commented Jul 2, 2024

BSchilperoort commented Jul 8, 2024

Daafip commented Jul 9, 2024

Daafip commented Jul 10, 2024

BSchilperoort commented Jul 11, 2024

RolfHut commented Jul 12, 2024

Daafip commented Jul 15, 2024

RolfHut commented Aug 2, 2024

BSchilperoort commented Sep 16, 2024

Daafip commented Nov 14, 2024

RolfHut commented Jun 20, 2024 via email •

edited by BSchilperoort

Loading