IO file storm when pipelines are run at scale #4158
Replies: 3 comments 2 replies
-
Hi @PeteClapham , Nextflow creates potentially several small files for each task that is executed:
This approach indeed puts a huge load on the shared filesystem at scale. Here are two things you can try:
We are also working on a number of features that should alleviate this issue somewhat, but at the end of the day, it's a fundamental part of how Nextflow works. In order to not write all of these files, the equivalent information would need to be stored by the HPC scheduler or some kind of database service. HPC schedulers typically do not store all of this info, or at least not for long, and while you could use a database service, well... most people already have a shared filesystem 😄 We're always open to new ideas though. I think the TES API aims to do exactly what I'm talking about, storing this metadata in a database instead of the filesystem. Funnel is the main TES backend that I know of, not sure if it supports LSF though. |
Beta Was this translation helpful? Give feedback.
-
Hi all, let me add more details. Nextflow indeed creates several small meta-data files, but it's hard to believe that this can determine heavy IO pressure on the file system. Those files are accessible sporadically to determine the task status or to fetch error logs in a failure. More likely the problem is caused by the fact that nextflow allows users to submit easily the executions of a large number of jobs which can be IO-intensive. @PeteClapham I have a question: do the nodes in your cluster have a local scratch storage? if yes, do your users set the process.scratch directive to |
Beta Was this translation helpful? Give feedback.
-
I can imagine that in very large (10-100k tasks) runs that are resumed, Nextflow will request thousands or tens of thousands of file reads (.exitcode) in a short period of time. Perhaps this is the issue that Sanger is running into. Is there a Nextflow configuration option to throttle the cache checks? |
Beta Was this translation helpful? Give feedback.
-
Expected behavior and actual behavior
NextFlow currently creates numerous small files during pipeline runs to maintain and track state. When pipelines are run at scale, this can create an IO storm which can prevent other workflows from taking place and place high performance parallel filesystems and wider clusters at risk of service failure.
Steps to reproduce the problem
Hi numbers of multi-component run in parallel across a scale HPC cluster (approx 20k cores) can create these file storms. The last recurrence was 2 weeks ago. We have seen this now on multiple occasions. On the last occasion the storm induced an IO storm that peaked at 448Million IOP/s
Due to the impact upon the filesystems, logging is unable to write out the usual debugging data.
Environment
Nextflow version: [?]
Current release
Java version: [?]
openjdk 11.0.19 2023-04-18
OpenJDK Runtime Environment (build 11.0.19+7-post-Ubuntu-0ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.19+7-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing)
Operating system: [macOS, Linux, etc]
Ubuntu 18.04
Bash version: (use the command
$SHELL --version
)GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
The issue seems unrelated to a given informatics software stack and we have seen the issue arising from different research areas and data sets and types.
The clusters are running with IBM Spectrum LSF 10.1.0.13
It is unclear at which point in the process the storm is created, however the use of small files at scale to maintain state is a potential architectural challenge.
Beta Was this translation helpful? Give feedback.
All reactions