Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

orrzohar · 2024-11-06T19:12:43Z

Hello,

I am trying to evaluate LLaVA OneVision 72B, but finding I need to use tensor-parallelism to fit it on memory. However, when I do, evaluating on datasets (e.g., MLVU) takes 90+hrs on 4 A100s.

Can this be sped up using multinode, using torchrun --nproc_per_node=1 --nnodes=64 so i can split the data between 64 nodes, each with 2-4 A100s, and that the nodes will use tensor-parallelism within the node and data-parallelism across the nodes?

Best,
Orr

The text was updated successfully, but these errors were encountered:

kcz358 · 2024-11-07T06:59:35Z

I think it is not possible for now. But for mlvu, I think the main reason is that some videos take extreme long time to load.

orrzohar · 2024-11-07T07:01:06Z

couldn't we figure out how to do DDP between nodes and TP inside nodes?

orrzohar · 2024-11-07T07:08:11Z

Right now, i am seeing it taking 45hrs on H100s and 90hrs on A100s... way to long for 72b model, no?

kcz358 · 2024-11-07T11:35:58Z

Right now, i am seeing it taking 45hrs on H100s and 90hrs on A100s... way to long for 72b model, no?

Yeah, and I think most of the time is the video reading time instead of the actual inference time. You can check that for a lot of time your gpu usage is low.

couldn't we figure out how to do DDP between nodes and TP inside nodes?

I think it could be possible for using sglang srt but we haven't really tested it as we didn't test even test the multi-node case for a stable release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

orrzohar commented Nov 6, 2024

kcz358 commented Nov 7, 2024

orrzohar commented Nov 7, 2024

orrzohar commented Nov 7, 2024

kcz358 commented Nov 7, 2024

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

Comments

orrzohar commented Nov 6, 2024

kcz358 commented Nov 7, 2024

orrzohar commented Nov 7, 2024

orrzohar commented Nov 7, 2024

kcz358 commented Nov 7, 2024