Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cdo post ocean format #1875

Closed
wants to merge 11 commits into from
Closed

Feature/cdo post ocean format #1875

wants to merge 11 commits into from

Conversation

HenryRWinterbottom
Copy link
Contributor

This PR addresses issue #923. A bash script is provided which updates/adds the necessary netCDF variable metadata attributes such that downstream applications can correctly remap the specified netCDF variables. No special Python libraries or utilities are required. Leveraging the capabilities of NCO allow support across all development and production platforms.

Resolves #923
Refs #1871

Type of change

  • New feature (adds functionality)

Change characteristics

  • This is a non-breaking change
  • This change does not require any documentation updates.

How has this been tested?

Tested using ice and ocean forecast files provided by @NeilBarton-NOAA and @jiandewang and executed on RDHPCS Hera, NOAA CSP AWS, and OSX.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
ush/cdo_post_ocean_format.sh Fixed Show fixed Hide fixed
Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for stand-alone script, but it needs to be connected into the workflow by being called by ocnpost.sh (and remove the NCL scripts being replaced).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "update" in the filename? Should this be called mom6_variables.csv instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The remapping for the MOM6 tripolar to a destination grid projection requires two steps:

  • The netCDF attributes need to be updated for the specified MOM6 variables to be remapped;
  • Using the the script introduced in PR CDO based post-processing application #1871 the variables specified in the parm/post/mom6_interp.csv can then be correctly interpolated; this is not required for the CICE forecast output.

ush/cdo_post_ocean_format.sh Outdated Show resolved Hide resolved
ush/cdo_post_ocean_format.sh Outdated Show resolved Hide resolved
function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
out_string=${in_string//[[:space:]]}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted this during development and it was not working as I needed it to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it worked when I tried a test case with spaces and tabs in a shell. Try again in case this is slightly different than you tried, then if still doesn't work, do this:

Suggested change
out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")
out_string=$(sed "s/ //g" <<< "${in_string}")

Ironically, I can't get mine to work using sed. It leaves the internal whitespace:

walter@ubuntu:~$ cat test2.bash 
a=" b 		c   "
echo "-->${a//[[:space:]]}<--"
echo "-->$(sed 's/ //g' <<< "${a}")<--"
echo "-->$(echo "${a}" | sed "s/ //g")<--"


walter@ubuntu:~$ bash test2.bash 
-->bc<--
-->b		c<--
-->b		c<--

ush/cdo_post_ocean_format.sh Outdated Show resolved Hide resolved
# $1 - The comma-delimited string to split.
#
# Global Variables:
# global_array - An array containing the split elements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are limited by how bash handles arrays, but I really hate this (setting a random global variable with the result). Would much rather pass a variable name in to be set (this is similar to how generate_com() does things). See associated code suggestions below.

Comment on lines 85 to 100
function _comma_split_string() {
local string="${1}"

local local_array=()
global_array=()
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item} ")
done
for item in "${local_array[@]}"; do
IFS=" " read -ra items <<< "${item}"
for element in "${items[@]}"; do
global_array+=("${element} ")
done
done
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _comma_split_string() {
local string="${1}"
local local_array=()
global_array=()
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item} ")
done
for item in "${local_array[@]}"; do
IFS=" " read -ra items <<< "${item}"
for element in "${items[@]}"; do
global_array+=("${element} ")
done
done
}
function _comma_split_string() {
local string="${1}"
local var_name="${2}"
# Declare local_array as a reference to the desired array name
declare -n local_array="${var_name}"
IFS="," read -ra items <<< "${string}"
for item in "${items[@]}"; do
local_array+=("${item}")
done
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit to ChatGPT for the declare -n trick. I was having trouble using declare to copy the array after-the-fact, but now we don't have to.

Comment on lines 156 to 157
_comma_split_string "${coords}"
coords="${global_array[@]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_comma_split_string "${coords}"
coords="${global_array[@]}"
_comma_split_string "${coords}" coords

coords_str="$(echo "${coords}" | $(command -v tr) -s ' ')"
ncattr_str="$(echo "${ncattr}" | $(command -v tr) -s ' ')"
echo "Adding netCDF attribute ${ncattr_str} values ${coords_str} to variable ${varname} metadata and writing to file ${output_path}"
($(command -v ncatted) -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using command -v is overkill.

Suggested change
($(command -v ncatted) -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")
(ncatted -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")

Copy link
Contributor Author

@HenryRWinterbottom HenryRWinterbottom Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced for the following reasons and with respect to my understanding of the best-practices for bash scripts.

Portability: (command -v) is more portable than directly relying on the presence of a specific file 
   or assuming a command is available at a particular location and works across different Unix-like systems.

Error Handling: It handles the edge-cases where a command is not found gracefully by
   checking the result and taking appropriate action, such as displaying an error message or 
   exiting the script.

Flexibility: It can be used in combination with other conditional statements to customize the 
    behavior of the script based on whether a command is available or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave it up to @aerorahul, but none of our existing scripts use this, and it clutters things up quite a lot once you start using it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

Comment on lines 179 to 186
# Get the attributes for the respective variable.
varname=$(echo "${line}" | $(command -v awk) '{print $1}')
ncattr=$(echo "${line}" | $(command -v awk) '{print $2}')
coords=$(echo "${line}" | $(command -v awk) '{print $3}')

# Update the variable attributes and write the updates to the
# specified output file (see `output_path`).
ncattr_update "${varname}" "${ncattr}" "${coords}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be:

Suggested change
# Get the attributes for the respective variable.
varname=$(echo "${line}" | $(command -v awk) '{print $1}')
ncattr=$(echo "${line}" | $(command -v awk) '{print $2}')
coords=$(echo "${line}" | $(command -v awk) '{print $3}')
# Update the variable attributes and write the updates to the
# specified output file (see `output_path`).
ncattr_update "${varname}" "${ncattr}" "${coords}"
# shellcheck disable=SC2086
ncattr_update ${line}

(The shellcheck directive is to stifle complaints about the lack of quotation marks.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not guarantee that the string will be parsed properly into the attributes that are required. The awk issue can be eliminated by adding an exception to the shell linter. Any use of awk will fail with the current configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't worried about the linter so much as just on a crusade to eliminate as many calls to external programs like sed and tr as possible for simple functions. I still don't see the difference, but I trust you. If you do need to use sed, prefer herestrings (<<< ${line}) over echoing through a pipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will give this another look. It's possible that I overlooked something previously.

@WalterKolczynski-NOAA
Copy link
Contributor

Is this supposed to replace #1871? Both are still open. (If so, in the future, we'd also prefer updating existing PRs instead of opening new ones.)

function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | $(command -v sed) "s/ //g")

Check warning

Code scanning / shellcheck

out_string appears unused. Verify use (or export if used externally). Warning

out_string appears unused. Verify use (or export if used externally).
local coords="${3}"

_comma_split_string "${coords}"
coords="${global_array[@]}"

Check warning

Code scanning / shellcheck

Assigning an array to a string! Assign as array, or use * instead of @ to concatenate. Warning

Assigning an array to a string! Assign as array, or use * instead of @ to concatenate.
ush/remap_prep.sh Fixed Show fixed Hide fixed
ush/remap_prep.sh Fixed Show fixed Hide fixed
ush/remap_prep.sh Fixed Show fixed Hide fixed
ush/remap_prep.sh Fixed Show fixed Hide fixed
function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | sed "s/ //g")

Check notice

Code scanning / shellcheck

See if you can use ${variable//search/replace} instead. Note

See if you can use ${variable//search/replace} instead.
coords_str="$(echo "${coords}" | tr -s ' ')"
ncattr_str="$(echo "${ncattr}" | tr -s ' ')"
echo "Adding netCDF attribute ${ncattr_str} values ${coords_str} to variable ${varname} metadata and writing to file ${output_path}"
(ncatted -O -a "${ncattr_str}","${varname}",c,c," ${coords_str}" "${output_path}" "${output_path}")

Check warning

Code scanning / shellcheck

Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"? Warning

Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?
# This example will remove all leading, trailing, and internal
# whitespace from the input string and display the cleaned result.
function _strip_whitespace(){
local in_string="${1}"

Check notice

Code scanning / shellcheck

Command appears to be unreachable. Check usage (or ignore if invoked indirectly). Note

Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | sed "s/ //g")

Check notice

Code scanning / shellcheck

Command appears to be unreachable. Check usage (or ignore if invoked indirectly). Note

Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
function _strip_whitespace(){
local in_string="${1}"

out_string=$(echo "${in_string}" | sed "s/ //g")

Check notice

Code scanning / shellcheck

Command appears to be unreachable. Check usage (or ignore if invoked indirectly). Note

Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
@HenryRWinterbottom
Copy link
Contributor Author

The reviewed components of the PR have been added/merged into PR #1871. The workflow hooks have not yet been added as we are awaiting feedback from the CMD team as to whether the results generated from #1871 are satisfactory. If so, the respective workflow hooks will be added.

Closing #1875.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port NCL scripts to python
3 participants