Hardware inventory #661

julien-c · 2024-05-10T15:01:38Z

I hesitated putting this one in @huggingface/tasks, or creating a new @huggingface/hardware.

What do you think?

The idea is the community to be able to contribute to that list.

What I picked for now ⤵️

Because I'm lazy – and because it's somewhat linked to #659 – i've added it to tasks.

krampstudio

👍

packages/tasks/src/hardware.ts

pcuenca

Verified the nvidia GPUs and the RAM in the Apple chips, ignored all the rest.

packages/tasks/src/hardware.ts

rwightman · 2024-05-13T15:40:10Z

If things are being simplified such that one flops number is used, it's probably good to focus on the most used/usable option, i.e. for Nvidia cards the non-sparse half-precision tensor core FLOPS is what most people will be operating under for train workloads, many inferencing ... although inferencing is starting to see heavier use of 8-bit int or float precisions on chips with support...

Comparing any Nvidia datacenter GPU on the base (non tensorcore) FP32 flops does not make any sense. Nobody uses them in that mode, they cut the capability of that mode from gamer GPUs so that they can fit more TC on there, maybe a few will be using TF32 on the TC, but most will be in mixed or half precision for matrix multiplies....

rwightman · 2024-05-13T15:46:25Z

Also some of the latest Intel CPU (and ARM) have bfloat16 and/or int8 instruction sets for mixed/half precision training and inference, so that's something to watch. Up to this point there were usually just float32 flops to consider on CPUs.

apolinario · 2024-05-14T07:08:48Z

An example of what @rwightman is bringing up is an A100

A100s have only 19.49 TFLOPS on fp32 - which would make it look weaker than A10, A10G, L4 and L40s which clearly it isn't in real world use-cases e.g.:

So I agree with Ross that maybe best to use fp16 as baseline flops vs fp32 for all GPUs. I think it may correlate better with perceived real world performance

Co-authored-by: Pedro Cuenca <[email protected]>

julien-c · 2024-05-14T13:22:56Z

@apolinario @rwightman, sounds good, where can one find the values for FP16?

(@apolinario where is your graph from)

julien-c · 2024-05-14T13:38:31Z

i've switched from FP32 => FP16 for NV and AMD GPUs.

TFlops for CPUs are still kinda random, if anyone wants to give me a hand?

apolinario · 2024-05-14T14:27:04Z

Regarding CPUs, the same generation of processor can vary in TFLOPS, I think this website can provide decent directions (https://www.cpu-monkey.com/en/benchmark-intel_core_i5_13600k-bench_11) and I'm happy to help with data-entry for CPUs

However, wondering how to handle the scenario:

i5 11th generation (Intel Core i5-11320H) can have 2 TFLOPS (https://www.cpu-monkey.com/en/benchmark-intel_core_i5_11320h-bench_11)
i5 11th generation (Intel Core i5-11400) can have 0.5 TFLOPS (https://www.cpu-monkey.com/en/benchmark-intel_core_i5_11400-bench_11)

Suggestion: having for each processor a low end and high end (so we don't need to enter every variant)

"Intel Core 11th Generation (i5) - low end": {
    tflops: 0.5,
},
"Intel Core 11th Generation (i5) - high end": {
    tflops: 2,
},

Would be bad if the user doesn't know how low/high end is their cpu, but if they don't know if theirs is low or high end they may not know the specific model to pick either 🤔 - also this misses intermediate steps - so not super sure whether this is the best way, but if we think this is a decent enough way happy to fill in the data for this

julien-c · 2024-05-14T14:36:21Z

honestly i would just suggest inputting a value in the middle of low end and high end

Especially given this comment line here at the top of export interface HardwareSpec:

 This is only approximate/theoretical and shouldn't be taken too seriously.

packages/tasks/src/hardware.ts

rwightman · 2024-05-14T15:52:47Z

i've switched from FP32 => FP16 for NV and AMD GPUs.

TFlops for CPUs are still kinda random, if anyone wants to give me a hand?

CPU flops can be quite confusing because there are FPLOPS from the old FPU (floating point unit), there are FLOPS from the SIMD AVX/NEON/etc units that are often leveraged by ML libs, and there are FLOPS from integrated GPUs.

The advertising of each is not consistent, and they tend to focus on the bigger number, often the iGPU but that is most likely not usable for ML in most AMD or Intel cases.

rwightman · 2024-05-14T16:11:47Z

Most NVIDIA datasheets or spec sumaries will have the list of different FLOPs. The bfloat16 or float16 'Tensor Core' FLOPS are the interesting ones (not the non-TC FP16 flops), more specifically the FP16 w/ FP32 accumulate (but they don't always destinguish this).

For datacentere GPU like A100, H100 (they are called 'Tensor Core' GPUs) or workstation Quadro FP16 w/ FP32 accum is th default in lower precision. For gamer GPUs they cripple them to differentiate price points and usually halve the performance of FP16 /w FP32 accum from FP16 w/ FP16 accum (not that useful for ML) so might distinguish on spec sheets.

The the other sill thing, when you look at some of the spec sheets, they use the 'sparisity' FLOPS number which is only reliazable in specific situations. It's often denoted by a small superscript (number is with sparsity). The actual number is typically half that.

Co-authored-by: apolinário <[email protected]>

julien-c

thanks a lot @apolinario!

julien-c · 2024-05-14T20:03:35Z

cc @mfuntowicz too on the whole thread btw

julien-c · 2024-05-15T11:26:04Z

@rwightman even if not cleanly comparable to GPUs, are the current CPU tflops values in the current state of this PR reasonable enough?

rwightman · 2024-05-15T16:13:21Z

@julien-c bit late to the party it seems, checking Ice Lake and Sapphire Rapids, they're both lower than I'd expect .. think Ice lake is in the 1-3 tflop range ans Sapphire is 3-4+..

Also, GPUs still way off, consumer cards about 1/2 what they should be and A100/H100 off by much more...

julien-c · 2024-05-15T18:51:50Z

@rwightman oops, PR welcome on top of this!

julien-c added 3 commits May 10, 2024 16:55

Create hardware.ts

1cf9355

different formatting than moon, i guess

caa8690

export from package

1cebf7b

julien-c requested review from osanseviero, SBrandeis, gary149 and Wauplin as code owners May 10, 2024 15:01

julien-c requested review from coyotte508, mfuntowicz, krampstudio and enzostvs May 10, 2024 15:01

coyotte508 approved these changes May 10, 2024

View reviewed changes

krampstudio approved these changes May 10, 2024

View reviewed changes

enzostvs approved these changes May 10, 2024

View reviewed changes

osanseviero reviewed May 10, 2024

View reviewed changes

packages/tasks/src/hardware.ts Show resolved Hide resolved

packages/tasks/src/hardware.ts Show resolved Hide resolved

packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved

pcuenca reviewed May 11, 2024

View reviewed changes

julien-c and others added 2 commits May 14, 2024 15:14

add a few more CPU RAM options

eff64a6

Fixes from @pcuenca

3c81dd9

Co-authored-by: Pedro Cuenca <[email protected]>

FP32 => FP16 + i love data entry

78e6da7

apolinario reviewed May 14, 2024

View reviewed changes