Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware inventory #661

Merged
merged 8 commits into from
May 15, 2024
Merged

Hardware inventory #661

merged 8 commits into from
May 15, 2024

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented May 10, 2024

I hesitated putting this one in @huggingface/tasks, or creating a new @huggingface/hardware.

What do you think?

The idea is the community to be able to contribute to that list.

What I picked for now ⤵️

Because I'm lazy – and because it's somewhat linked to #659 – i've added it to tasks.

Copy link
Collaborator

@krampstudio krampstudio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

packages/tasks/src/hardware.ts Show resolved Hide resolved
packages/tasks/src/hardware.ts Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the nvidia GPUs and the RAM in the Apple chips, ignored all the rest.

packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
packages/tasks/src/hardware.ts Outdated Show resolved Hide resolved
@rwightman
Copy link

rwightman commented May 13, 2024

If things are being simplified such that one flops number is used, it's probably good to focus on the most used/usable option, i.e. for Nvidia cards the non-sparse half-precision tensor core FLOPS is what most people will be operating under for train workloads, many inferencing ... although inferencing is starting to see heavier use of 8-bit int or float precisions on chips with support...

Comparing any Nvidia datacenter GPU on the base (non tensorcore) FP32 flops does not make any sense. Nobody uses them in that mode, they cut the capability of that mode from gamer GPUs so that they can fit more TC on there, maybe a few will be using TF32 on the TC, but most will be in mixed or half precision for matrix multiplies....

@rwightman
Copy link

Also some of the latest Intel CPU (and ARM) have bfloat16 and/or int8 instruction sets for mixed/half precision training and inference, so that's something to watch. Up to this point there were usually just float32 flops to consider on CPUs.

@apolinario
Copy link
Contributor

An example of what @rwightman is bringing up is an A100
image

image

A100s have only 19.49 TFLOPS on fp32 - which would make it look weaker than A10, A10G, L4 and L40s which clearly it isn't in real world use-cases e.g.:
image

So I agree with Ross that maybe best to use fp16 as baseline flops vs fp32 for all GPUs. I think it may correlate better with perceived real world performance

@julien-c
Copy link
Member Author

julien-c commented May 14, 2024

@apolinario @rwightman, sounds good, where can one find the values for FP16?

(@apolinario where is your graph from)

@julien-c
Copy link
Member Author

i've switched from FP32 => FP16 for NV and AMD GPUs.

TFlops for CPUs are still kinda random, if anyone wants to give me a hand?

@apolinario
Copy link
Contributor

apolinario commented May 14, 2024

Regarding CPUs, the same generation of processor can vary in TFLOPS, I think this website can provide decent directions (https://www.cpu-monkey.com/en/benchmark-intel_core_i5_13600k-bench_11) and I'm happy to help with data-entry for CPUs

However, wondering how to handle the scenario:

Suggestion: having for each processor a low end and high end (so we don't need to enter every variant)

"Intel Core 11th Generation (i5) - low end": {
    tflops: 0.5,
},
"Intel Core 11th Generation (i5) - high end": {
    tflops: 2,
},

Would be bad if the user doesn't know how low/high end is their cpu, but if they don't know if theirs is low or high end they may not know the specific model to pick either 🤔 - also this misses intermediate steps - so not super sure whether this is the best way, but if we think this is a decent enough way happy to fill in the data for this

@julien-c
Copy link
Member Author

honestly i would just suggest inputting a value in the middle of low end and high end

Especially given this comment line here at the top of export interface HardwareSpec:

 This is only approximate/theoretical and shouldn't be taken too seriously.

@rwightman
Copy link

rwightman commented May 14, 2024

i've switched from FP32 => FP16 for NV and AMD GPUs.

TFlops for CPUs are still kinda random, if anyone wants to give me a hand?

CPU flops can be quite confusing because there are FPLOPS from the old FPU (floating point unit), there are FLOPS from the SIMD AVX/NEON/etc units that are often leveraged by ML libs, and there are FLOPS from integrated GPUs.

The advertising of each is not consistent, and they tend to focus on the bigger number, often the iGPU but that is most likely not usable for ML in most AMD or Intel cases.

@rwightman
Copy link

Most NVIDIA datasheets or spec sumaries will have the list of different FLOPs. The bfloat16 or float16 'Tensor Core' FLOPS are the interesting ones (not the non-TC FP16 flops), more specifically the FP16 w/ FP32 accumulate (but they don't always destinguish this).

For datacentere GPU like A100, H100 (they are called 'Tensor Core' GPUs) or workstation Quadro FP16 w/ FP32 accum is th default in lower precision. For gamer GPUs they cripple them to differentiate price points and usually halve the performance of FP16 /w FP32 accum from FP16 w/ FP16 accum (not that useful for ML) so might distinguish on spec sheets.

The the other sill thing, when you look at some of the spec sheets, they use the 'sparisity' FLOPS number which is only reliazable in specific situations. It's often denoted by a small superscript (number is with sparsity). The actual number is typically half that.

Copy link
Member Author

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @apolinario!

@julien-c
Copy link
Member Author

cc @mfuntowicz too on the whole thread btw

@julien-c
Copy link
Member Author

@rwightman even if not cleanly comparable to GPUs, are the current CPU tflops values in the current state of this PR reasonable enough?

@julien-c julien-c merged commit bba3138 into main May 15, 2024
4 checks passed
@julien-c julien-c deleted the hardware-inventory branch May 15, 2024 13:53
@rwightman
Copy link

@julien-c bit late to the party it seems, checking Ice Lake and Sapphire Rapids, they're both lower than I'd expect .. think Ice lake is in the 1-3 tflop range ans Sapphire is 3-4+..

Also, GPUs still way off, consumer cards about 1/2 what they should be and A100/H100 off by much more...

@julien-c
Copy link
Member Author

@rwightman oops, PR welcome on top of this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants