Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] A serial metatask nested inside a parallel metatask is treated as parallel #109

Open
WalterKolczynski-NOAA opened this issue Jul 25, 2024 · 3 comments

Comments

@WalterKolczynski-NOAA
Copy link

WalterKolczynski-NOAA commented Jul 25, 2024

What is wrong

When placing a serial metatask inside of a parallel metatask, all of the serial tasks will be queued at once when the dependency is met instead of running sequentially.

What should have happened

The serial tasks should wait for the task before them in sequence before being queued while sequences runs independently in parallel. Only the first task in each sequence should start when the explicit dependencies are satisfied.

Schedulers impacted

Seen on both slurm and pbspro

Steps to reproduce

  1. Create a workflow with a serial metatask inside a parallel one. Here's a minimal testcase that can be modified:
<?xml version="1.0"?>
<!DOCTYPE workflow
[
    <!ENTITY ACCOUNT "fv3-cpu">
    <!ENTITY QUEUE "batch">
    <!ENTITY PARTITION "hercules">
    <!ENTITY OUTDIR "/work2/noaa/stmp/wkolczyn/test">
]>

<workflow realtime="F" scheduler="slurm" cyclethrottle="1" taskthrottle="20">

    <log verbosity="10"><cyclestr>&OUTDIR;/rocoto.log</cyclestr></log>
    
    <cycledef>202103211200 202103231200 06:00:00</cycledef>

    <task name="pretask">
        <command>echo "pre_task"; sleep 10</command>
        <jobname>test_pre</jobname>
        <account>&ACCOUNT;</account>
        <queue>&QUEUE;</queue>
        <partition>&PARTITION;</partition>
        <walltime>00:05:00</walltime>
        <nodes>1:ppn=1:tpp=1</nodes>

        <join><cyclestr>&OUTDIR;/test_pre.log</cyclestr></join>
    </task>

    <metatask name="metatask" mode="parallel">

        <var name="mem">00 01</var>

        <metatask name="mem#mem#" mode="serial">

            <var name="seg">0 1</var>

            <task name="mem#mem#_seg#seg#">
                <command>echo "member: #mem#  segment: #seg#"; sleep 180</command>
                <jobname>test_mem#mem#_seg#seg#</jobname>
                <account>&ACCOUNT;</account>
                <queue>&QUEUE;</queue>
                <partition>&PARTITION;</partition>
                <walltime>00:05:00</walltime>
                <nodes>1:ppn=1:tpp=1</nodes>

                <join><cyclestr>&OUTDIR;/test_mem#mem#_seg#seg#.log</cyclestr></join>

                <dependency>
                    <taskdep task="pretask"/>
                </dependency>

            </task>
        </metatask>
    </metatask>
</workflow>
  1. Run the workflow and observe all of the tasks inside the metatask be queued at once after the first job completes.

Additional info

Encountered while trying to implement forecast segments for GEFS.

Switching the order of the metatasks (parallel inside of sequential) works correctly, but then none of the second serial tasks will run until all the first ones have completed.

@christopherwharrop-noaa
Copy link
Collaborator

Thank you for the report @WalterKolczynski-NOAA. And thank you especially for the reproducer. That makes it a lot easier for me to drill down on the problem. I will investigate and see if i can figure out what is going on. The reported behavior is definitely incorrect. I can test myself, but I'm assuming the behavior is the same when the outer metatask leaves the mode unspecified (the default is parallel).

@WalterKolczynski-NOAA
Copy link
Author

Yes. I believe I tried both implicit and explicit parallel just in case.

@WalterKolczynski-NOAA
Copy link
Author

Oops, I thought I had changed the queue back to batch before I submitted. If you use the debug queue, it will be harder to see the problem because there is a two-job at-a-time limit. Fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants