Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix: Do not specify a separate service account #2780

Closed

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Jul 19, 2024

Description

Separate service accounts are not required on any system and causes issues when running CI. This removes service accounts from the workflow setup scripts.

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

Ran setup scripts on Hera with ACCOUNT=nems; verified account was set to nems in the resulting rocoto XML for all jobs.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • New and existing tests pass with my changes

aerorahul
aerorahul previously approved these changes Jul 19, 2024
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DavidHuber-NOAA
Copy link
Contributor Author

I'm going to add a fix to the CI tests as well to do a shallow submodule checkout. The unit tests are failing due to too much disk space usage.

aerorahul
aerorahul previously approved these changes Jul 19, 2024
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DavidHuber-NOAA
Copy link
Contributor Author

@CoryMartin-NOAA @RussTreadon-NOAA @danholdaway It appears that the download error I am seeing here and in the Jenkins checkouts is not due to disk limitation, but limitations placed by git-lfs repositories. The jcb repo has too much data in it to enable a recursive checkout from the global-workflow.

See /scratch1/NCEPDEV/global/David.Huber/GW/gw_test/.git/modules/sorc/gdas.cd/modules/sorc/jcb/lfs/logs/20240719T171340.919113479.log for a detailed message.
See this SO thread for a possible solution.

@aerorahul
Copy link
Contributor

We can set export GIT_LFS_SKIP_SMUDGE=1 before the git clone command (I have no idea how to do that in the Jenkins SCM)
This will allow the cloning of JCB without the LFS data
Before:

❯❯❯ git clone emcgh:jcb                                                                                                              ✘ 130
Cloning into 'jcb'...
remote: Enumerating objects: 656, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 656 (delta 0), reused 0 (delta 0), pack-reused 638
Receiving objects: 100% (656/656), 120.28 KiB | 8.59 MiB/s, done.
Resolving deltas: 100% (286/286), done.
Downloading etc/jcb-text.png (11 KB)
Error downloading object: etc/jcb-text.png (fdb18dd): Smudge error: Error downloading etc/jcb-text.png (fdb18ddd36bb2285c6e588ad94559babc318eba3e802b9e1c349fed18a3a82af): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to '/Users/rmahajan/scratch/jcb/.git/lfs/logs/20240719T142102.976203.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: etc/jcb-text.png: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

After setting export GIT_LFS_SKIP_SMUDGE=1,

❯❯❯ git clone emcgh:jcb
Cloning into 'jcb'...
remote: Enumerating objects: 656, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 656 (delta 0), reused 0 (delta 0), pack-reused 638
Receiving objects: 100% (656/656), 120.28 KiB | 8.59 MiB/s, done.
Resolving deltas: 100% (286/286), done.

Identity set to Rahul Mahajan <[email protected]>

@DavidHuber-NOAA
Copy link
Contributor Author

Alright, I have applied the fix to this PR as well as #2775 and on my local Jenkins clone on Hera. I then restarted CI testing on #2775, which is now running smoothly. I think Hera is now running smoothly, but I will keep an eye on Jenkins before launching another test case.

@DavidHuber-NOAA
Copy link
Contributor Author

Unfortunately, #2775 CI failed to find the github Python module. More investigation underway.

@@ -44,6 +44,7 @@ if [[ -d global-workflow ]]; then
rm -Rf global-workflow
fi

export GIT_LFS_SKIP_SMUDGE=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not need this after the hotfix, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll revert this change here, in my local Jenkins clones, and in #2775 and restart CI in the latter.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DavidHuber-NOAA
Copy link
Contributor Author

Closing, merged into #2775.

@DavidHuber-NOAA DavidHuber-NOAA deleted the hotfix/service_account branch August 13, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants