Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocoto <complete> dependency #24

Open
samtrahan opened this issue Jul 30, 2018 · 6 comments
Open

rocoto <complete> dependency #24

samtrahan opened this issue Jul 30, 2018 · 6 comments

Comments

@samtrahan
Copy link
Contributor

This requests the ecFlow "complete" directive be added to Rocoto. It would look something like this:

<task ...>
  ...
  <complete>
    <or>
      <taskdep>...</taskdep>
      <datadep>...</datadep>
    </or>
  </complete>
</task>

If the <complete> directive is met, the job is considered SUCCEEDED with the job id set to "completed." This would be inserted in the Rocoto implementation just before submit_new_jobs is called.

This is a cleaner feature than the "final='T'" because it eliminates the problematic aspects of "drained" cycles.

@christopherwharrop
Copy link
Owner

What is the use case for this feature?

@samtrahan
Copy link
Contributor Author

Two examples come to mind:

  1. HWRF has multiple types of forecasts. Which one runs is not known until part of the way through the workflow.
  2. The FV3 GFS workflow skips some jobs depending on configuration for that cycle. The 00z, 06z, 12z, and 18z sometimes vary, and the first two cycles are special.

Such things can be dealt with by using final="T" tasks but it leads to incredibly complex code. This is especially problematic when generating a workflow automatically. Look at this file on Jet for an example:

/lfs3/projects/hfv3gfs/glopara/noscrub/expdir/fv3q2fy19retro5_dell_restart/workflow.xml

It is generated from this file:

/lfs3/projects/hfv3gfs/glopara/noscrub/expdir/fv3q2fy19retro5_dell_restart/workflow.yaml

@christopherwharrop
Copy link
Owner

Ok. So, for example, if a workflow will run either A or B, but you don't know which one's dependencies will be satisfied, you'll add a <complete> dependency in both A and B such that the one that isn't run is marked as complete if the one that is run is completed?

@samtrahan
Copy link
Contributor Author

samtrahan commented Jul 30, 2018

Chris,

Depending on the specifics of the workflow, A and B may not be able to have direct dependencies on one another. I would have to see an example. A simpler example is if A and B depend on a prior event:

A <depends> on file X existing when job C is done and is <complete> if file X does not exist when C is done
B <depends> on file X NOT existing when job C is done and is <complete> if file X does exist when C is done

You can see examples of this in the HWRF workflow where there are two branches of the data assimilation parts of the workflow depending on whether TDR data is available.

Compared to final="T", a <complete> directive is a far more direct method, that is easier to understand when reading the XML, easier to apply, and avoids the complexities that result from final="T"

@christopherwharrop
Copy link
Owner

Ok. You've convinced me.

I can see the utility of this in providing a way to have branching in a workflow (something that DAGs do a very bad job of representing). I agree that the "final" task attribute introduces some issues with booting and rewind. But, I see these two things as having distinct purposes. The "final" is really a way to "complete" an entire cycle, whereas, <complete> is a way to "complete" individual tasks that are "done" because their counterpart is to be run instead.

@samtrahan
Copy link
Contributor Author

Chris,

I like that explanation. It would make sense for the Rocoto documentation to have <complete> and 'final="T"` in the same section and start with that sentence about how one is for a cycle and the other is for a task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants