Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create csv file and user-mod directories for PLUMBER2 sites #2137

Closed
ekluzek opened this issue Sep 7, 2023 · 28 comments
Closed

Create csv file and user-mod directories for PLUMBER2 sites #2137

ekluzek opened this issue Sep 7, 2023 · 28 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 7, 2023

As part of our tower-site hackathon, a task is to create a csv file and user-mod directories for the PLUMBER2 sites that can be added to CLM. This is part of the work in #1487. @olyson has a script that can create the user-mod directories, so likely that will be used for the initial version of this work.

The NEON csv file has a header as follows...

,Site,Domain,Lat,Lon,pft,start_year,end_year

We propose the csv file for PLUMBER2 should look like this:

,Site,Lat,Lon,pft1,pft1-%,pft2,pft2%,DATM_YR_START,DATM_YR_END,RUNSTART_DATE,ATM_NCPL

start_date is in YYYY-MM-DD form. We'll assume to start on Jan/1st at 0 UT and run full years ending on Dec/31 at last time step of the day.

@ekluzek ekluzek added enhancement new capability or improved behavior of existing capability tag: enh - new science labels Sep 7, 2023
@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 7, 2023

Start and end dates would be an encoded string. It might as well be in the same format that we use for RUN_STARTDATE so YYYY-MM-DD, and then also the START_TOD (Time Of Day). So I'll change the above to add extra columns for TOD.

@olyson
Copy link
Contributor

olyson commented Sep 7, 2023

I have an example csv file here:
/glade/work/oleson/PLUMBER2/NEON_PLUMBER/PLUMBER2_sites.csv

The first few lines are:

,Site,Lat,Lon,pft1,pft1-%,pft2,pft2-%,start_year,end_year,RUN_STARTDATE,START_TOD,ATM_NCPL
1,AR-SLu,-33.464802,-66.459808,5,50.00,7,50.00,2010,2010,2010-01-01,10800,48
2,AT-Neu,47.116669,11.317500,13,100.00,-999,-999.00,2002,2012,2001-12-31,82800,48
3,AU-ASM,-22.283001,133.248993,1,100.00,-999,-999.00,2011,2017,2010-12-31,54000,48
4,AU-Cow,-16.238190,145.427155,4,100.00,-999,-999.00,2010,2015,2009-12-31,50400,48

start_year and end_year are the starting and ending years for the datm (in mct this was DATM_CLMNCEP_YR_ALIGN and DATM_CLMNCEP_YR_END). As suggested, I've also included RUN_STARTDATE and START_TOD. ATM_NCPL is required to set the time step of the model to match the time step of the atmospheric forcing (either 1/2 or 1 hour).
Some of these sites are (generic) crop and hence here the pft1/pft2 are set to either 15 or 16.
I've encoded -999 for pft2 and -999.00 for pft2-% if there isn't a second pft.
Comments welcome.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 7, 2023

Thanks for putting this together @olyson, this is great.

From the discussion this morning, I suggested adding a column for the start and end year, but we thought it's easy enough to pull out of the start and end date. So based on that, I'd remove those two columns. Also I assume the end date would be some arbitrary date in the end year, so it would be good to have it added in. And maybe the end date should be called STOP_DATE and STOP_TOD? I think I like naming the columns for the variables that have an XML variable name (RUN_STARTDATE, etc.) for them, so I like that change.

For generic crop sites, it seems it would be better to put a supported CFT code for it, so that you could run it both as generic crop and also with the prognostic crop model. That just gives you more options.

Using -999 as fillvalue makes sense to me and will be easy to parse for both human and computer.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 7, 2023

The other thing I wonder if there should be some comments to explain the file format as comments at the start of the file?
So the parser could ignore initial comments that start the line with a # for example. Then you could take several lines to explain all of these things.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 7, 2023

One question I have about this is how this will get put into CDEPS? CDEPS could respond to this csv file, and/or process it to update it's XML files. Or the dates could just show up in the user-mod directories in the shell_commands file. Since, MCT is being deprecated we should just set this up for NUOPC, so the XML variables could be handled the same way they are for NEON with this in the shell_commands file:

./xmlchange DATM_YR_ALIGN=2018,DATM_YR_END=2021,DATM_YR_START=2018

@olyson
Copy link
Contributor

olyson commented Sep 8, 2023

The start_year at least, which is used in my script, as you've done with NEON, to set DATM_YR_START, can be different from the year encoded in RUN_STARTYEAR, because we are starting at GMT corresponding to local midnight. So I think both of those (start_year and RUN_STARTYEAR) are needed.
The end_year is used to specify the DATM_YR_END and, we have complete years of forcing for every tower site, so we use STOP_OPTION="nyears" and STOP_N="X", where X is calculated as DATM_YR_END-DATM_YR_START+1. These settings are implemented via xmlchange commands. So I'm inclined not to include STOP_DATE and STOP_TOD, unless there is something I'm missing about the logic.
I can include some comments in the csv file to clarify these settings.
The information included in the PLUMBER2 data doesn't specify the type of crop for those crop sites. So even in BGC mode, these are set to generic crops. However, perhaps you are saying that there should instead be columns for every site called "cft1", "cft2", "cft1-%", "cft2-%", that would be -999 for non-crop sites and real values otherwise? Is this consistent with how crop sites are handled in NEON, or aren't there any crop sites?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 9, 2023

@olyson we should perhaps finalize some of these decisions at our hackathon. But, let me just set some of this up for more discussion.

With NEON we wanted the stop date/time, so that we could run over the entire period for transient cases. For spinup cases you can only run over whole years, but it's good to have both so that you can run over the entire period for transient.

Also you can either calculate the DATM_YR_START and DATM_YR_END in the script to create the csv file -- or you can calculate this in the script that reads the csv file. Either one is valid, and we can talk more about if there are any pros and cons on Wednesday.

On crops. For NEON the pft index uses the full range of 78 PFT's. There are only a couple Crop sites, but are labeled as the specific crop (so 19 which is spring wheat). Since, it's labeled as Spring Wheat you can do simulations with it, as both generic crop (with use_crop turned off), or as Spring Wheat (when use_crop is on). That just gives users the flexibility to run it either way. I would hope that we could investigate to figure out what specific crops are at the sites, so we could run either way.

Two other things. One is that for NEON we made all the surface datasets be 78PFT to make it easier to handle and not to have to have some one way, and others 16pft. The other thing is that we'll want to setup surface datasets for FATES. For NEON we did that by having mixed PFT's that used the PFT mix from the 1-degree grid cell. So we probably want to do the same thing here. For FATES we'd just do the non-crop sites, and the files would be 16pft.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 13, 2023

We decided this morning in the group hackathon, that starting with 16pft surface datasets makes sense and running over only full years. This will allow FATES to use the same fsurdat files. At a later date we could expand to 78pft files, but this gets us a good start that works for most cases in the simplest way. So FATES-SP mode would use the mix of PFT's set for each site. For full FATES it'll use the PFT mix in the FATES parameter file. For FATES we would eventually want either a FATES parameter file for each site, or some files to use FATES tools to modify the parameter file for each site. But, that can be a future development.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 13, 2023

@olyson also had some infrastructure to handle how to do spinup's taking into account that the data is Gregorian, while the spinup is no_leap. We think this is handled now in NUOPC, so will try it out for a case to see if we still need that infrastructure. Hopefully we don't. But, can add it in if needed.

@olyson
Copy link
Contributor

olyson commented Sep 13, 2023

We need to add canopy top and bottom heights to the csv file as these are site-specific and used in SP mode - I'll do that.
Just a reminder that the user-mods will need to include information on the lai streams files for SP mode as well.

@olyson
Copy link
Contributor

olyson commented Sep 14, 2023

@wwieder , I've added canopy top and bottom heights to the csv file:

/glade/work/oleson/ctsm_PLUMBERcsv/tools/site_and_regional/PLUMBER2_sites.csv

So you could try it in your script.
I also noticed that there don't seem to be surface datasets created for the crop sites here:

/glade/u/home/wwieder/CTSM/tools/site_and_regional/subset_data_single_point

For example, the BE-Lon site. So maybe I have something wrong set in the csv file for those sites. Did you get any error reporting on those?

@wwieder
Copy link
Contributor

wwieder commented Sep 29, 2023

suggested usermod_dirs for PLUMBER2 are here
/glade/u/home/wwieder/CTSM/cime_config/usermods_dirs/PLUMBER2

Right now I'm making this in a notebook (in kind of a hacky way), but we can convert this to a python script if we want to keep it
/glade/u/home/wwieder/CTSM/tools/site_and_regional/plumber2_usermods.ipynb


@olyson when I try manually creating the surface dataset for a crop site

./subset_data point --lat 38.1087 --lon -121.653107 --site 1x1_PLUMBER2_US-Twt --dompft 16 --pctpft 100.0 --create-surface --uniform-snowpack --cap-saturation --verbose --overwrite

I get the following error from .subset_data.py:

argparse.ArgumentTypeError: Please use --crop flag when --dompft is above 15.
I'll adjust this for now and we can have a quick chat on how to handle this error check to allow for generic crop cases.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 29, 2023

@wwieder I think that's a bug in subset_data. 15 and 16 are the generic crop rainfed and irrigated. So it should allow for 16. So we should fix this in subset data.

@wwieder
Copy link
Contributor

wwieder commented Sep 29, 2023

it's easy to fix for pft 16, but breaking for pft 15 for some reason?

@wwieder
Copy link
Contributor

wwieder commented Sep 29, 2023

I made the following changes in /glade/u/home/wwieder/CTSM/python/ctsm/site_and_regional/single_point_case.py

on line 18
+NAT_PFT = 16

and line 185

+            if self.num_pft < max_dom_pft < MAX_PFT:
+                err_msg = "Please use --crop flag when --dompft is above 16."

This works for dompft =16, but fails with dompft = 15 and I don't understand why?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 29, 2023

This if statement doesn't look right to me. Notice it's using MAX_PFT and NOT NAT_PFT. And it's also making sure num_pft is less than max_dom_pft as well as MAX_PFT. Also since the inequalities are < rather than <= it looks to me that NAT_PFT should 17 or the inequalities changed.

I think the num_pft < max_dom_pft part is where it fails for 15.

@wwieder
Copy link
Contributor

wwieder commented Sep 30, 2023

Thanks for looking at this Erik. I'm not sure I'm understanding the logic in the code or what you're suggesting I change, but when I set NAT_PFT=17

The error from line 408 in the code states:
IndexError: index 16 is out of bounds for axis 0 with size 16

leaving NAT_PFT=16 creates a similar error.
IndexError: index 15 is out of bounds for axis 0 with size 15

@wwieder
Copy link
Contributor

wwieder commented Sep 30, 2023

OK setting the --crop flag (almost has things working correctly except that PCT_CFT isn't getting set correctly for a grid with 100% PFT=16

ncdump -v PCT_NAT_PFT,PCT_CFT,PCT_NATVEG,PCT_CROP /glade/u/home/wwieder/CTSM/tools/site_and_regional/subset_data_single_point/surfdata_1x1_PLUMBER2_US-Twt_hist_16pfts_Irrig_CMIP6_simyr2000_c230930.nc

PCT_CFT =
  100,
  0 ;

 PCT_NATVEG =
  0 ;

 PCT_CROP =
  100 ;

@wwieder
Copy link
Contributor

wwieder commented Sep 30, 2023

Fixed by commenting out error flags on lines 197 & 211 of python/ctsm/site_and_regional/single_point_case.py and without using --crop flag
#raise argparse.ArgumentTypeError(err_msg)

Likely need to do some testing to make sure this is working OK for NEON or other configurations?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Sep 30, 2023

@wwieder literally what you are doing is removing the abort on error for two different error checks. So that is "safe" to do in that it isn't going mess up anything that is already working. So you don't need to test other cases. It's not "safe" in that you removed two error checks. So it's like going around without seat belts. Doing so doesn't limit your travel -- it just causes problem if something goes wrong.

This is one of the things I'd like us to get better at and illustrates the principle of error checks. To get them right you need to do some testing, and best to have a test case that you can validate that it dies under the right conditions. Then that test can be modified if it wasn't correct. It also makes it more obvious if the check is right by looking at the test rather than trying to parse the code in your head. What sometimes happens when we don't test error checks is that the logic is wrong and it causes problems so we remove the error check. But, then you don't have a sensible error check to know what to do when something goes wrong.

The second error check you removed has to do with mixing crop and natural-veg types. I think there might be a reason for this that's embedded into subset_data, so we should be more cautious about removing it.

@wwieder
Copy link
Contributor

wwieder commented Oct 1, 2023

I like the analogy, @ekluzek . The error checks were made for one application (NEON & generic single point) and now we're using them for something different (Plumber). To carry the analogy a bit further maybe this is similar to a bicycle vs. skiing helmet. My 'fixes' for plumber are basically undoing the buckle that holds either helmet on. On Weds. we can decide how fail safe we need these error check to be.

@wwieder
Copy link
Contributor

wwieder commented Oct 3, 2023

OK, I have code that:

  • creates surface datasets and modifies canopy top and bottom heights for all plumber2 sites (see surface data here /glade/u/home/wwieder/CTSM/tools/site_and_regional/subset_data_single_point).
  • creates usermod_dirs and customizes individual site configurations /glade/u/home/wwieder/CTSM/cime_config/usermods_dirs/PLUMBER2.

Remaining todos include:

  • adding LAI streams to usermods
  • additional usermod changes for START_TOD,ATM_NCPL for individual NEON sites (if needed)?
  • cleaning up some of my hacky code modifications
  • adding units tests

These may be easier once I figure out how to contribute to Keith's development branch... but for now code modifications are in
/glade/u/home/wwieder/CTSM/tools/site_and_regional and
/glade/u/home/wwieder/CTSM/python/ctsm/

@olyson
Copy link
Contributor

olyson commented Oct 4, 2023

Additional todo: The PLUMBER2 usermods will need to include the LAI streams files.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 4, 2023

@olyson, so PLUMBER2 has LAI stream files for each site? Or do you mean to use the global LAI streams files?

A technical thing with that is that the LAI streams can only be used for SP or FATES-SP mode. You could trigger that in the user-mods by querying the compset. We do that sort of thing in the NEON user-mods.

@olyson
Copy link
Contributor

olyson commented Oct 4, 2023

@ekluzek Yes it has an LAI stream file for each site. And yes, my script only uses it for SP mode, it turns it off in BGC mode.

@olyson
Copy link
Contributor

olyson commented Oct 4, 2023

Another PLUMBER2-specific thing is that we set baseflow_scalar = 0 in user_nl_clm for "wetland" sites. The list of these sites is:
"CZ-wet"
"DE-SfN"
"FI-Kaa"
"FI-Lom"
"RU-Che"
"SE-Deg"
"US-Los"
"US-Myb"
"US-Tw4"
"PL-wet"

@olyson
Copy link
Contributor

olyson commented Oct 4, 2023

I wrote a script to compare the new surface datasets that Will created with those used in our original PLUMBER2 submission for all of the sites. As Will mentioned, differences in PCT_NAT_PFT show up for cropland sites because of how PCT_NAT_PFT is handled. Any other differences are small are due to rounding some of the original field values in the csv file.

@samsrabin samsrabin added science Enhancement to or bug impacting science and removed enh - new science labels Aug 8, 2024
@TeaganKing
Copy link
Contributor

I believe this issue was covered by #2485 , so I am closing it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science
Projects
None yet
Development

No branches or pull requests

5 participants