Dynamically scale CPU count per volume in submitit #485

ziw-liu · 2024-11-26T19:39:46Z

When the reconstruction volume is large, RAM needed becomes disproportional to the one CPU core:

Preparing 1 job with 1 CPU and 41 GB of memory per CPU.

Since HPC nodes don't usually have more than 15 GB RAM per CPU, this is leaving CPU horsepower on the table, as both PyTorch (compute) and zarr-python's numcodecs (I/O) can potentially be multi-threaded. This needs some tweaking of the threading though, as it used to cause problems in multi-processing.

The text was updated successfully, but these errors were encountered:

talonchandler · 2024-11-27T00:44:13Z

Thanks for flagging @ziw-liu.

My understanding of the the current reconstruction pipeline is that for large arrays FFTs are the bottleneck, so I expect additional CPUs to give marginal improvements at best. Unless you have reason to believe torch can easily multiprocess an FFT(?).

I think this bottleneck might move to I/O when we move the FFTs to the GPU, though. I'll keep an eye out.

ziw-liu · 2024-11-27T01:05:42Z

Unless you have reason to believe torch can easily multiprocess an FFT(?).

I would guess that torch can do multi-threaded FFT on the CPU just like it can on the GPU. There was a previous (maybe the current stable) version of recOrder/waveorder that benefits from multiple CPU threads for reconstructions.

ziw-liu added the enhancement New feature or request label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically scale CPU count per volume in submitit #485

Dynamically scale CPU count per volume in submitit #485

ziw-liu commented Nov 26, 2024

talonchandler commented Nov 27, 2024

ziw-liu commented Nov 27, 2024 •

edited

Loading

Dynamically scale CPU count per volume in submitit #485

Dynamically scale CPU count per volume in submitit #485

Comments

ziw-liu commented Nov 26, 2024

talonchandler commented Nov 27, 2024

ziw-liu commented Nov 27, 2024 • edited Loading

ziw-liu commented Nov 27, 2024 •

edited

Loading