Set up PBS workflow #22

bschroeter · 2024-05-17T00:08:00Z

meorg_client needs to operate inside an internet-accessible environment. This means that we need to run it on the copyq, however, benchcab itself runs on a compute node. As such, we need to chain a series of PBS jobs to achieve the level of desired functionality.

The proposed workflow is as follows:

[JOB 1, compute] Benchcab runs, writes output files, triggers an meorg_client job on the copyq.
[JOB 2, copyq] meorg_client uploads the files to the server, noting the JOB_ID of each file, which is used to query the transfer to the object store. A subsequent job is triggered (Job 3) at a computed interval of 5mins + 150mbit/sec for the total data transfer + 10%.
[JOB 3, copyq] meorg_client queries the JOB_IDs to get the true FILE_ID that is then used to attach the files to the model outputs. Once successful, meorg_client triggers the analysis.

Depending on the notification capability of the server, there may be an optional 4th job to query the status of the analysis and alert the user to the outcome and/or email a link to the plots.

There is a minimum of 3 PBS jobs required (1 compute + 2 copyq), unless we allow the copyq job to run for longer and combine the meorg steps into a single job. This may not be an acceptable use of resources.

This may be a good time to work on the Python implementation of handling PBS jobs as the logic may become cumbersome in vanilla shell.

paolap · 2024-07-01T05:04:41Z

Hi Ben, this is something we can definitely help with. I asked Dale ( @dsroberts ) to have a look when he has time. Let us know if you're happy with this

bschroeter transferred this issue from CABLE-LSM/benchcab May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up PBS workflow #22

Set up PBS workflow #22

bschroeter commented May 17, 2024 •

edited

Loading

paolap commented Jul 1, 2024

Set up PBS workflow #22

Set up PBS workflow #22

Comments

bschroeter commented May 17, 2024 • edited Loading

paolap commented Jul 1, 2024

bschroeter commented May 17, 2024 •

edited

Loading