Add benchmark info of representative runs? #104

GanZhang-GFD · 2024-07-24T15:54:39Z

Would including the benchmark info of representative runs in the documentation be helpful?

These runs could include a 14-day ensemble simulation, a 1-year deterministic simulation, and so on. It would also be interesting to see the performance differences between GPU and TPU.

The information can help users check their local configuration and make informed decisions about their experiments (e.g., Google Cloud or local Nvidia machines).

shoyer · 2024-07-24T21:07:31Z

We have have training and inference runtimes for all our different models in an Extended Data table from the paper:
https://www.nature.com/articles/s41586-024-07744-y/tables/1

The inference numbers are all reported for a single core of a Google TPU v4 chip.

Is there something else you were thinking of?

GanZhang-GFD · 2024-07-24T22:05:35Z

Thanks. I misread the table and thought the benchmark chip was T4 of Colab. The lower performance on a local implementation also confused me.

As a data point for the community, the inference time of a benchmark comparable task with JAX (cuda) + Nvidia L40S (A100-level) is approximately 40s. This is a preliminary number with a naive implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark info of representative runs? #104

Add benchmark info of representative runs? #104

GanZhang-GFD commented Jul 24, 2024

shoyer commented Jul 24, 2024

GanZhang-GFD commented Jul 24, 2024

Add benchmark info of representative runs? #104

Add benchmark info of representative runs? #104

Comments

GanZhang-GFD commented Jul 24, 2024

shoyer commented Jul 24, 2024

GanZhang-GFD commented Jul 24, 2024