Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to dedicate accelerator directive (GPU) over range #5570

Open
adamrtalbot opened this issue Dec 4, 2024 · 4 comments
Open

Ability to dedicate accelerator directive (GPU) over range #5570

adamrtalbot opened this issue Dec 4, 2024 · 4 comments

Comments

@adamrtalbot
Copy link
Collaborator

New feature

When submitting to nodes with >1 GPU, Nextflow has very limited capabilities to split the work over separate GPUs.

Usage scenario

Let's imagine running on AWS Batch. We submit multiple GPU enabled tasks to the Batch service. AWS allocates them to a single, large instance with multiple GPUs, as per it's allocation strategy which prioritises the cheapest per CPU price.

In this instance, all tasks would be able to use all GPUs at the same time, leading to collisions and GPU memory issues.

We have some strategies to deal with this:

  • Use a specific machine size which can only fit a single GPU enabled task
  • Use environment variables such as NVIDIA_VISIBLE_DEVICES to point each task at a specific GPU
  • Set maxForks to 1 to ensure only a single task is executed at once

However, we lack a way of saying "for a queue of x GPUs, assign each task to 1 GPU available"

What I want is each task to know which GPU we can use, then only use that GPU.

Suggest implementation

I don't actually have a good fix here. Perhaps using a process array with an index might help? Perhaps it's specific to each executor? But I feel like Nextflow could expose a variable to help us here.

@bentsherman
Copy link
Member

I think this needs to be addressed by the executor, i.e. AWS Batch or SLURM should allow multiple tasks to request 1 GPU each, pack them onto a multi-GPU node as it would for CPUs, and set the NVIDIA_VISIBLE_DEVICES for each task.

Otherwise Nextflow would basically have to become the executor by tracking the VM assignments for each task in order to figure out which GPUs are available at any given time.

@FloWuenne
Copy link

@bentsherman SLURM is already doing this I believe? Did some testing yesterday and setting the following in the nextflow.config will tell SLURM to ask for 1 GPU for each task.

process{
   clusterOptions = '--gres=gpu:1'  // Request 1 GPU per task
} 

If submitted to a multi-GPU node, different GPUs will be assigned but they will all be mapped to device 0 in each task. So CUDA_VISIBLE_DEVICES will be 0 inside all tasks running in parallel on different GPUs I believe.

@bentsherman
Copy link
Member

For SLURM it may depend on the individual cluster setup. The sysadmin can (probably) use cgroups to isolate GPUs just like you would for CPUs and memory, so that a job only sees the requested resources even if the underlying node has more.

Setting the CUDA_VISIBLE_DEVICES is also probably something that would have to be configured by the sysadmin, but if cgroups work then you don't have to bother with that environment variable.

@bentsherman
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants