Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up PBS workflow #22

Open
bschroeter opened this issue May 17, 2024 · 1 comment
Open

Set up PBS workflow #22

bschroeter opened this issue May 17, 2024 · 1 comment

Comments

@bschroeter
Copy link
Contributor

bschroeter commented May 17, 2024

meorg_client needs to operate inside an internet-accessible environment. This means that we need to run it on the copyq, however, benchcab itself runs on a compute node. As such, we need to chain a series of PBS jobs to achieve the level of desired functionality.

The proposed workflow is as follows:

  1. [JOB 1, compute] Benchcab runs, writes output files, triggers an meorg_client job on the copyq.
  2. [JOB 2, copyq] meorg_client uploads the files to the server, noting the JOB_ID of each file, which is used to query the transfer to the object store. A subsequent job is triggered (Job 3) at a computed interval of 5mins + 150mbit/sec for the total data transfer + 10%.
  3. [JOB 3, copyq] meorg_client queries the JOB_IDs to get the true FILE_ID that is then used to attach the files to the model outputs. Once successful, meorg_client triggers the analysis.

Depending on the notification capability of the server, there may be an optional 4th job to query the status of the analysis and alert the user to the outcome and/or email a link to the plots.

There is a minimum of 3 PBS jobs required (1 compute + 2 copyq), unless we allow the copyq job to run for longer and combine the meorg steps into a single job. This may not be an acceptable use of resources.

This may be a good time to work on the Python implementation of handling PBS jobs as the logic may become cumbersome in vanilla shell.

@bschroeter bschroeter transferred this issue from CABLE-LSM/benchcab May 17, 2024
@paolap
Copy link

paolap commented Jul 1, 2024

Hi Ben, this is something we can definitely help with. I asked Dale ( @dsroberts ) to have a look when he has time. Let us know if you're happy with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants