Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

Open
orrzohar opened this issue Nov 6, 2024 · 4 comments
Open

Evaluating LLaVA OneVision 72B - memory, speedup, multinode #400

orrzohar opened this issue Nov 6, 2024 · 4 comments

Comments

@orrzohar
Copy link

orrzohar commented Nov 6, 2024

Hello,

I am trying to evaluate LLaVA OneVision 72B, but finding I need to use tensor-parallelism to fit it on memory. However, when I do, evaluating on datasets (e.g., MLVU) takes 90+hrs on 4 A100s.

Can this be sped up using multinode, using torchrun --nproc_per_node=1 --nnodes=64 so i can split the data between 64 nodes, each with 2-4 A100s, and that the nodes will use tensor-parallelism within the node and data-parallelism across the nodes?

Best,
Orr

@kcz358
Copy link
Collaborator

kcz358 commented Nov 7, 2024

I think it is not possible for now. But for mlvu, I think the main reason is that some videos take extreme long time to load.

@orrzohar
Copy link
Author

orrzohar commented Nov 7, 2024

couldn't we figure out how to do DDP between nodes and TP inside nodes?

@orrzohar
Copy link
Author

orrzohar commented Nov 7, 2024

Right now, i am seeing it taking 45hrs on H100s and 90hrs on A100s... way to long for 72b model, no?

@kcz358
Copy link
Collaborator

kcz358 commented Nov 7, 2024

Right now, i am seeing it taking 45hrs on H100s and 90hrs on A100s... way to long for 72b model, no?

Yeah, and I think most of the time is the video reading time instead of the actual inference time. You can check that for a lot of time your gpu usage is low.

couldn't we figure out how to do DDP between nodes and TP inside nodes?

I think it could be possible for using sglang srt but we haven't really tested it as we didn't test even test the multi-node case for a stable release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants