-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HuyanVideo version for RTX 3090 / RTX 4090 just released #109
Comments
as it needs an 80gb gpu it seems unlikely to fit on a 24gb gpu unless it is at 4bit |
That's incorrect, the main video model itself is around 24GB and can be reduced to 12GB with 8 bits quantization for a cost of a minimal degradation . The text encoder (T5 XXL) is 40 GB if you use the 32 bits version with the decoder that is not needed, but if you keep only the encoder in 16 bits format (no degradation at all), it takes only 10 GB. |
@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo? by the way T5 can be loaded as FP8 and there is even better scaled version did you try them? |
+1 |
Key changes would be in gradio_server.py the rest is getting rid of references to the original model. Beyond that, I'm a little confused but I think the point is to download the quantized models manually and put them in the ckpt folder. Which makes me wonder, can you still use xDiT with this? |
No change is in the base model because the magic is in my offload library mmgp called from gradio_server.py. The same library can be used to offload Flux, Cogview, Mochi models, …I will provide soon more sample apps. Be aware there right now you need 64 GB of RAM. There is little gain to prequantify the models as it is done on the fly and is pretty fast. |
@deepbeepmeep can you give link about mmgp library? it sounds nice |
@deepbeepmeep i saw you just update the code for low ram, can you tell me will it work with 32 gb + 3090? |
@Dhilu16 The RAM is used to store the models while they are not in the GPU, I am afraid I don't think 32 GB will be sufficient. You can try anyway. Maybe one solution would be te quantize as well the text encoder to reduce the RAM consumption. Maybe if I have some time this week, I will add an option. |
you can install the module using "pip install mmgp". You will find instructions on how to use the module below: https://github.com/deepbeepmeep/mmgp If you are interested I have also applied my module to Flux Fill (very good iterative inpainting / outpainting tool): |
I got mixed up with Flux, in fact Hunyan Video uses a Llama based text encoder. It could be indeed quantized to reduce the memory its foot print. I will add this capability when I have time. |
I have just published mmgp 1.2.0 that now accepts an extra parameter (modelsToQuantize) that contains a list of additional models to quantize. So here you can try "offload.all(pipe, modelsToQuantize= ["text_encoder"])" to quantize both the video model (quantized by default) and the Llama text_encoder. Please let me know if it helps. |
@deepbeepmeep awesome work so i hope you add those features to your gradio that is what i am planning to test |
It is right now in the latest version of my fork. You may comment / un comment lines 34-36 to try the different options. |
Curious if there's an easy way to keep it from loading a model for each GPU into RAM when adapting the gradio.py code to sample.py? I'm loading the text encoder from a pre-quantized model and I'm not even sure 96gb would be enough RAM. |
installed it on Win11 nvidia RTX3080 (10GB vram) and used 42GB DDR4 ram (out of 96GB). it took about 3H to generate 49 frames, 25 stgeps |
Windows 2022, RTX3090 - 25 minutes to 848 x 480 x 97 frames, 50 steps |
God bless you. |
Unfortunately 10 GB of VRAM is insufficient right now. I am working on an improved version that will use sequence offloading to reduce even more the VRAM requirements but I am afraid that 10GB of VRAM won't be enough anyway. |
Thanks for the the best open source video generator !!!
In have a created a fork of this repository and adapted it so that this Hunyuan Video can run even on consumer GPUs.
https://github.com/deepbeepmeep/HunyuanVideoGP
It is pretty fast for a consumer GPU as you can generate 97 frames (more than 3s) frames at 848x480 in less than 12 minutes.
The text was updated successfully, but these errors were encountered: