Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a slurm workflow manager #789

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

linsword13
Copy link
Collaborator

Example usage:

ramble:
  variants:
    workflow_manager: slurm
  variables:
    mpi_command: mpirun -n {n_ranks} -hostfile hostfile
    processes_per_node: 1
  applications:
    hostname:
      workloads:
        local:
          experiments:
            test:
              variables:
                n_nodes: 1
  # The batch_submit is defined by the workflow object
ramble on

  # Executors for query and cancel are available
ramble on --executor "{query_job}"
ramble on --executor "{cancel_job}"

@linsword13 linsword13 force-pushed the workflow branch 11 times, most recently from 493648f to 0a705d9 Compare December 13, 2024 04:30

def generate_query_command(self, job_id):
return rf"""
status=$(squeue -h -o "%t" -j {job_id})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how picky I am about this, but I would almost prefer the status map be in python rather than in bash.

And maybe some of the logic mapped into the base class too.

lib/ramble/ramble/test/conftest.py Show resolved Hide resolved
super().__init__(file_path)

self.runner = SlurmRunner()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also think about how to allow this modifier to have a default mpi_command (that could still be overridden)

Example usage:

```yaml
ramble:
  variants:
    workflow_manager: slurm
  variables:
    mpi_command: mpirun -n {n_ranks} -hostfile hostfile
    processes_per_node: 1
  applications:
    hostname:
      workloads:
        local:
          experiments:
            test:
              variables:
                n_nodes: 1
```

```
 # The batch_submit is defined by the workflow object
ramble on

 # Executors for query and cancel are available
ramble on --executor "{query_job}"
ramble on --executor "{cancel_job}"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants