Replies: 1 comment
-
I've discovered an additional issue, which is that even if you run
It will fail with the same errors when using FP8 quantization:
Here, 3/8 of the therads fail to load Presumably, the FP8 quantization kernels require hipBLASlt and so can't run w/ the current bug? BTW, surprisingly when testing at
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
FYI, I've filed a curious bug I've encountered w/ PyTorch but at least want to mention it here (if not create a duplicate issue yet unless they triage it as a not their problem): pytorch/pytorch#137695
Basically running the latest vLLM (HEAD) and the PyTorch nightly it depends on, hipBLASlt is used by default, and works for
-tp 1
to-tp 4
but at-tp 8
it consistently starts to report errors with loadingTensileLibrary_lazy_gfx942.dat
. The workaround is to useTORCH_BLAS_PREFER_HIPBLASLT=0
, and at least for tp 1-4, this is slightly faster anyway in my vLLM benchmark_throughput testing.Leaving this here to potentially save some people some hair-pulling, as it took me a while to debug (since it works on lower tps)
Beta Was this translation helpful? Give feedback.
All reactions