Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cdo post ocean format #1875

Closed
wants to merge 11 commits into from
Closed
21 changes: 21 additions & 0 deletions parm/post/mom6_update.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "update" in the filename? Should this be called mom6_variables.csv instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The remapping for the MOM6 tripolar to a destination grid projection requires two steps:

  • The netCDF attributes need to be updated for the specified MOM6 variables to be remapped;
  • Using the the script introduced in PR CDO based post-processing application #1871 the variables specified in the parm/post/mom6_interp.csv can then be correctly interpolated; this is not required for the CICE forecast output.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
SST coordinates geolon,geolat,time
SSS coordinates geolon,geolat,time
SSH coordinates geolon,geolat,time
speed coordinates geolon,geolat,time
MLD_003 coordinates geolon,geolat,time
so coordinates geolon,geolat,time
temp coordinates geolon,geolat,time
latent coordinates geolon,geolat,time
sensible coordinates geolon,geolat,time
SW coordinates geolon,geolat,time
LW coordinates geolon,geolat,time
evap coordinates geolon,geolat,time
lprec coordinates geolon,geolat,time
LwLatSens coordinates geolon,geolat,time
Heat_PmE coordinates geolon,geolat,time
SSU coordinates geolon_u,geolat_u,time
uo coordinates geolon_u,geolat_u,time
taux coordinates geolon_u,geolat_u,time
SSV coordinates geolon_v,geolat_v,time
vo coordinates geolon_v,geolat_v,time
tauy coordinates geolon_v,geolat_v,time
193 changes: 193 additions & 0 deletions ush/cdo_post_ocean_format.sh
WalterKolczynski-NOAA marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
#! /usr/bin/env bash

#######
# Script for updating/adding specified netCDF variable metadata
# attribute values.
#
# Syntax:
# cdo_post_prep.sh variable_file input_netcdf output_netcdf
#
# Arguments:
#
# variable_file: ASCII formatted file containing netCDF variables
# and the respective metadata attributes to be
# updated/added; the supported format is as
# follows.
#
# <netCDF variable name> <netCDF metadata attribute name>
# <netCDF metadata values>
#
# An example using the format described above is as follows.
#
# SST coordinates geolon,geolat,time
# uo coordinates geolon_u,geolon_u,time
# vo coordinates geolon_v,geolon_v,time
#
# The example above will perform the following task using this
# script.
#
# * Assign the netCDF metadata attribute `coordinates` for
# variable `SST` the values `geolon,geolat,time` and update
# the `output_netcdf` file path.
#
# * Assign the netCDF metadata attribute `coordinates` for
# variable `uo` the values `geolon_u,geolat_u,time` and update
# the `output_netcdf` file path.
#
# * Assign the netCDF metadata attribute `coordinates` for
# variable `vo` the values `geolon_v,geolat_v,time` and update
# the `output_netcdf` file path.
#
# input_netcdf: The netCDF-formatted file path containing the
# variables defined in `variable_file`.
#
# output_netcdf: A netCDF-formatted file path to contain the
# specified variables remapped to the destination
# grid projection.
#######

# Collect the command line arguments and check the validity.
variable_file="${1}"
input_path="${2}"
output_path="${3}"

#######

if [[ "$#" -ne 3 ]]; then
echo "Usage: $0 <variable_file> <input_path> <output_path>"
WalterKolczynski-NOAA marked this conversation as resolved.
Show resolved Hide resolved
exit 100
fi

#######

# _comma_split_string - Split a comma-delimited string into an array.
#
# Description:
# This function takes a comma-delimited string as input and splits
# it into an array. Each element in the resulting array is
# obtained by splitting the input string at commas and then
# removing leading and trailing spaces.
#
# Parameters:
# $1 - The comma-delimited string to split.
#
# Global Variables:
# global_array - An array containing the split elements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are limited by how bash handles arrays, but I really hate this (setting a random global variable with the result). Would much rather pass a variable name in to be set (this is similar to how generate_com() does things). See associated code suggestions below.

#
# Example usage:
# _comma_split_string "item1,item2 item3,item4"
WalterKolczynski-NOAA marked this conversation as resolved.
Show resolved Hide resolved
# for element in "${global_array[@]}"; do
# echo "$element"
# done
#
# This example will split the input string into individual elements
# and print each element on a separate line.
function _comma_split_string() {
local string="${1}"

local local_array=()
global_array=()
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item} ")
done
for item in "${local_array[@]}"; do
IFS=" " read -ra items <<< "${item}"
for element in "${items[@]}"; do
global_array+=("${element} ")
done
done
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _comma_split_string() {
local string="${1}"
local local_array=()
global_array=()
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item} ")
done
for item in "${local_array[@]}"; do
IFS=" " read -ra items <<< "${item}"
for element in "${items[@]}"; do
global_array+=("${element} ")
done
done
}
function _comma_split_string() {
local string="${1}"
local var_name="${2}"
# Declare local_array as a reference to the desired array name
declare -n local_array="${var_name}"
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item}")
done
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit to ChatGPT for the declare -n trick. I was having trouble using declare to copy the array after-the-fact, but now we don't have to.


#######

# _strip_whitespace - Remove whitespace from a string.
#
# Description:
# This function takes an input string and removes all whitespace
# characters (spaces, tabs, and newline characters) to produce a
# cleaned output string.
#
# Parameters:
# $1 - The input string from which whitespace will be removed.
#
# Return:
# The cleaned string with no whitespace.
#
# Example usage:
# cleaned_string=$(_strip_whitespace " This is a string with spaces ")
# echo "Cleaned string: \"$cleaned_string\""
#
# This example will remove all leading, trailing, and internal
# whitespace from the input string and display the cleaned result.
function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
out_string=${in_string//[[:space:]]}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted this during development and it was not working as I needed it to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it worked when I tried a test case with spaces and tabs in a shell. Try again in case this is slightly different than you tried, then if still doesn't work, do this:

Suggested change
out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
out_string=$(sed "s/ //g" <<< "${in_string}")

Ironically, I can't get mine to work using sed. It leaves the internal whitespace:

walter@ubuntu:~$ cat test2.bash 
a=" b 		c   "
echo "-->${a//[[:space:]]}<--"
echo "-->$(sed 's/ //g' <<< "${a}")<--"
echo "-->$(echo "${a}" | sed "s/ //g")<--"


walter@ubuntu:~$ bash test2.bash 
-->bc<--
-->b		c<--
-->b		c<--

}

#######

# ncattr_update - Update/add attributes for a variable in a netCDF file.
#
# Description:
# This function updates the specified attribute for a specified
# variable in a netCDF file using the `ncatted` command.
#
# Parameters:
# $1 - The variable name to update.
# $2 - netCDF variable metadata attribute name.
# $3 - The coordinates as a comma-separated string.
#
# Global Variables:
# global_array - An array containing the split coordinates.
# output_path - The path to the output netCDF file.
#
# Example usage:
# ncupdate "variable_name" "coords" "lon,lat,time"
#
# This example updates the `coords` attributes for the specified
# variable and writes the updates to the output netCDF file.
function ncattr_update(){
local varname="${1}"
local ncattr="${2}"
local coords="${3}"

_comma_split_string "${coords}"
coords="${global_array[@]}"
Fixed Show fixed Hide fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_comma_split_string "${coords}"
coords="${global_array[@]}"
_comma_split_string "${coords}" coords

coords_str="$(echo "${coords}" | $(command -v tr) -s ' ')"
ncattr_str="$(echo "${ncattr}" | $(command -v tr) -s ' ')"
echo "Adding netCDF attribute ${ncattr_str} values ${coords_str} to variable ${varname} metadata and writing to file ${output_path}"
($(command -v ncatted) -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")
Fixed Show fixed Hide fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using command -v is overkill.

Suggested change
($(command -v ncatted) -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")
(ncatted -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")

Copy link
Contributor Author

@HenryRWinterbottom HenryRWinterbottom Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced for the following reasons and with respect to my understanding of the best-practices for bash scripts.

Portability: (command -v) is more portable than directly relying on the presence of a specific file 
   or assuming a command is available at a particular location and works across different Unix-like systems.

Error Handling: It handles the edge-cases where a command is not found gracefully by
   checking the result and taking appropriate action, such as displaying an error message or 
   exiting the script.

Flexibility: It can be used in combination with other conditional statements to customize the 
    behavior of the script based on whether a command is available or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave it up to @aerorahul, but none of our existing scripts use this, and it clutters things up quite a lot once you start using it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

}

#######

start_time=$(gdate +%s) # TODO: For local debugging.
_calling_script=$(basename "${BASH_SOURCE[0]}")
start_time_human=$(gdate -d"@${start_time}" -u) # TODO: For local debugging.
echo "Begin ${_calling_script} at ${start_time_human}."

# Copy the input file path to the output file path.
echo "Copying file ${input_path} to ${output_path} and preparing for variable updates."
$(command -v cp) "${input_path}" "${output_path}"

# Read the configuration file for the the variables to be updated and
# proceed accordingly.
while IFS= read -r line; do

# Get the attributes for the respective variable.
varname=$(echo "${line}" | $(command -v awk) '{print $1}')
Fixed Show fixed Hide fixed
ncattr=$(echo "${line}" | $(command -v awk) '{print $2}')
Fixed Show fixed Hide fixed
coords=$(echo "${line}" | $(command -v awk) '{print $3}')
Fixed Show fixed Hide fixed

# Update the variable attributes and write the updates to the
# specified output file (see `output_path`).
ncattr_update "${varname}" "${ncattr}" "${coords}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be:

Suggested change
# Get the attributes for the respective variable.
varname=$(echo "${line}" | $(command -v awk) '{print $1}')
ncattr=$(echo "${line}" | $(command -v awk) '{print $2}')
coords=$(echo "${line}" | $(command -v awk) '{print $3}')
# Update the variable attributes and write the updates to the
# specified output file (see `output_path`).
ncattr_update "${varname}" "${ncattr}" "${coords}"
# shellcheck disable=SC2086
ncattr_update ${line}

(The shellcheck directive is to stifle complaints about the lack of quotation marks.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not guarantee that the string will be parsed properly into the attributes that are required. The awk issue can be eliminated by adding an exception to the shell linter. Any use of awk will fail with the current configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't worried about the linter so much as just on a crusade to eliminate as many calls to external programs like sed and tr as possible for simple functions. I still don't see the difference, but I trust you. If you do need to use sed, prefer herestrings (<<< ${line}) over echoing through a pipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will give this another look. It's possible that I overlooked something previously.


done < "${variable_file}"

stop_time=$(gdate +%s) # TODO: For local debugging.
_calling_script=$(basename "${BASH_SOURCE[0]}")
stop_time_human=$(gdate -d"@${stop_time}" -u) # TODO: For local debugging.
echo "End ${_calling_script} at ${stop_time_human}."