HuyanVideo version for RTX 3090 / RTX 4090 just released #109

deepbeepmeep · 2024-12-10T23:24:44Z

Thanks for the the best open source video generator !!!

In have a created a fork of this repository and adapted it so that this Hunyuan Video can run even on consumer GPUs.

https://github.com/deepbeepmeep/HunyuanVideoGP

It is pretty fast for a consumer GPU as you can generate 97 frames (more than 3s) frames at 848x480 in less than 12 minutes.

breadbrowser · 2024-12-11T00:01:49Z

as it needs an 80gb gpu it seems unlikely to fit on a 24gb gpu unless it is at 4bit

deepbeepmeep · 2024-12-11T01:39:21Z

That's incorrect, the main video model itself is around 24GB and can be reduced to 12GB with 8 bits quantization for a cost of a minimal degradation . The text encoder (T5 XXL) is 40 GB if you use the 32 bits version with the decoder that is not needed, but if you keep only the encoder in 16 bits format (no degradation at all), it takes only 10 GB.
Then you just need to offload unused models to the CPU as only one is needed at a given time and voila !
Even if the quality is not a good as the full 16 bits version, the end result is far ahead anything that is opensource (and that can run on a RTX 4090).

FurkanGozukara · 2024-12-11T01:43:25Z

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

by the way T5 can be loaded as FP8 and there is even better scaled version did you try them?

yuanbopeng · 2024-12-11T03:18:44Z

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

+1

tavyscrolls · 2024-12-11T03:49:12Z

Key changes would be in gradio_server.py the rest is getting rid of references to the original model. Beyond that, I'm a little confused but I think the point is to download the quantized models manually and put them in the ckpt folder.

Which makes me wonder, can you still use xDiT with this?

deepbeepmeep · 2024-12-11T08:28:45Z

No change is in the base model because the magic is in my offload library mmgp called from gradio_server.py. The same library can be used to offload Flux, Cogview, Mochi models, …I will provide soon more sample apps. Be aware there right now you need 64 GB of RAM. There is little gain to prequantify the models as it is done on the fly and is pretty fast.

FurkanGozukara · 2024-12-11T08:58:56Z

@deepbeepmeep can you give link about mmgp library? it sounds nice

Dhilu16 · 2024-12-11T14:27:50Z

@deepbeepmeep i saw you just update the code for low ram, can you tell me will it work with 32 gb + 3090?
Also if possible please update the instructions also.
Thanks, you did a great job man!

deepbeepmeep · 2024-12-11T17:02:16Z

@Dhilu16 The RAM is used to store the models while they are not in the GPU, I am afraid I don't think 32 GB will be sufficient. You can try anyway. Maybe one solution would be te quantize as well the text encoder to reduce the RAM consumption. Maybe if I have some time this week, I will add an option.

deepbeepmeep · 2024-12-11T17:06:32Z

@FurkanGozukara

you can install the module using "pip install mmgp". You will find instructions on how to use the module below: https://github.com/deepbeepmeep/mmgp

If you are interested I have also applied my module to Flux Fill (very good iterative inpainting / outpainting tool):
https://github.com/deepbeepmeep/FluxFillGP

deepbeepmeep · 2024-12-11T17:10:42Z

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

by the way T5 can be loaded as FP8 and there is even better scaled version did you try them?

I got mixed up with Flux, in fact Hunyan Video uses a Llama based text encoder. It could be indeed quantized to reduce the memory its foot print. I will add this capability when I have time.

deepbeepmeep · 2024-12-11T20:48:26Z

@FurkanGozukara and @Dhilu16

I have just published mmgp 1.2.0 that now accepts an extra parameter (modelsToQuantize) that contains a list of additional models to quantize.

So here you can try "offload.all(pipe, modelsToQuantize= ["text_encoder"])" to quantize both the video model (quantized by default) and the Llama text_encoder.

Please let me know if it helps.

FurkanGozukara · 2024-12-11T20:57:22Z

@deepbeepmeep awesome work

so i hope you add those features to your gradio that is what i am planning to test

deepbeepmeep · 2024-12-12T06:48:12Z

@deepbeepmeep awesome work

so i hope you add those features to your gradio that is what i am planning to test

It is right now in the latest version of my fork. You may comment / un comment lines 34-36 to try the different options.

tavyra · 2024-12-12T21:47:30Z

Curious if there's an easy way to keep it from loading a model for each GPU into RAM when adapting the gradio.py code to sample.py? I'm loading the text encoder from a pre-quantized model and I'm not even sure 96gb would be enough RAM.

rzgarespo · 2024-12-14T01:15:11Z

installed it on Win11 nvidia RTX3080 (10GB vram) and used 42GB DDR4 ram (out of 96GB). it took about 3H to generate 49 frames, 25 stgeps
youtube.com/watch?v=ylfeJ7Cv8AE

randaller · 2024-12-15T14:46:33Z

Windows 2022, RTX3090 - 25 minutes to 848 x 480 x 97 frames, 50 steps

krishnapraveen7 · 2024-12-16T08:31:08Z

God bless you.

deepbeepmeep · 2024-12-17T16:36:15Z

installed it on Win11 nvidia RTX3080 (10GB vram) and used 42GB DDR4 ram (out of 96GB). it took about 3H to generate 49 frames, 25 stgeps youtube.com/watch?v=ylfeJ7Cv8AE

Unfortunately 10 GB of VRAM is insufficient right now. I am working on an improved version that will use sequence offloading to reduce even more the VRAM requirements but I am afraid that 10GB of VRAM won't be enough anyway.
With this new version I can get up to 8s (180 frames) in 40 minutes. on my RTX 4090
I will update this thread when the new version is available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

deepbeepmeep commented Dec 10, 2024

breadbrowser commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

yuanbopeng commented Dec 11, 2024

tavyscrolls commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

Dhilu16 commented Dec 11, 2024 •

edited

Loading

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

deepbeepmeep commented Dec 12, 2024

tavyra commented Dec 12, 2024

rzgarespo commented Dec 14, 2024

randaller commented Dec 15, 2024 •

edited

Loading

krishnapraveen7 commented Dec 16, 2024

deepbeepmeep commented Dec 17, 2024

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

Comments

deepbeepmeep commented Dec 10, 2024

breadbrowser commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

yuanbopeng commented Dec 11, 2024

tavyscrolls commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

Dhilu16 commented Dec 11, 2024 • edited Loading

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

deepbeepmeep commented Dec 11, 2024

FurkanGozukara commented Dec 11, 2024

deepbeepmeep commented Dec 12, 2024

tavyra commented Dec 12, 2024

rzgarespo commented Dec 14, 2024

randaller commented Dec 15, 2024 • edited Loading

krishnapraveen7 commented Dec 16, 2024

deepbeepmeep commented Dec 17, 2024

Dhilu16 commented Dec 11, 2024 •

edited

Loading

randaller commented Dec 15, 2024 •

edited

Loading