Inference with a list of prompts without re-loading the model each time #122

JosephPai · 2024-12-12T14:09:20Z

Hi authors, I would like to run the model for a list of prompts in multi-gpu mode. To save the time on loading the pre-trained model each time, I modified the sample_video.py file with a for loop to work on a list of prompts.
However, the code works well for the first prompt, but always fails at the second one.
Could you help look into this issue? Thanks.

import os
import time
from pathlib import Path
from loguru import logger
from datetime import datetime
import torch

from hyvideo.utils.file_utils import save_videos_grid
from hyvideo.config import parse_args
from hyvideo.inference import HunyuanVideoSampler


def main():
    args = parse_args()
    print(args)
    models_root_path = Path(args.model_base)
    if not models_root_path.exists():
        raise ValueError(f"`models_root` not exists: {models_root_path}")
    
    # Create save folder to save the samples
    save_path = args.save_path if args.save_path_suffix=="" else f'{args.save_path}_{args.save_path_suffix}'
    if not os.path.exists(args.save_path):
        os.makedirs(save_path, exist_ok=True)

    # Load models
    hunyuan_video_sampler = HunyuanVideoSampler.from_pretrained(models_root_path, args=args)
    
    # Get the updated args
    args = hunyuan_video_sampler.args

    for i in range(5):
        # Start sampling
        # TODO: batch inference check
        outputs = hunyuan_video_sampler.predict(
            prompt=args.prompt + f"_test_{i}",
            height=args.video_size[0],
            width=args.video_size[1],
            video_length=args.video_length,
            seed=args.seed,
            negative_prompt=args.neg_prompt,
            infer_steps=args.infer_steps,
            guidance_scale=args.cfg_scale,
            num_videos_per_prompt=args.num_videos,
            flow_shift=args.flow_shift,
            batch_size=args.batch_size,
            embedded_guidance_scale=args.embedded_cfg_scale
        )
        samples = outputs['samples']

        # Save samples
        if 'LOCAL_RANK' not in os.environ or int(os.environ['LOCAL_RANK']) == 0:
            for i, sample in enumerate(samples):
                sample = samples[i].unsqueeze(0)
                time_flag = datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d-%H:%M:%S")
                save_path = f"{save_path}/{time_flag}_seed{outputs['seeds'][i]}_{outputs['prompts'][i][:100].replace('/','')}.mp4"
                save_videos_grid(sample, save_path, fps=24)
                logger.info(f'Sample save to: {save_path}')

        torch.cuda.empty_cache()
        torch.distributed.barrier()

if __name__ == "__main__":
    main()

Error message:

HunyuanVideo/hyvideo/inference.py", line 63, in new_forward
[rank1]:     raise ValueError(f"Cannot split video sequence into ulysses_degree x ring_degree ({get_sequence_parallel_world_size()}) parts evenly")
[rank1]: ValueError: Cannot split video sequence into ulysses_degree x ring_degree (8) parts evenly

The text was updated successfully, but these errors were encountered:

tavyra · 2024-12-12T20:36:50Z

Does it work if you just make prompt a ["Prompt List", "List of Prompts"] ? Inference.py seems to have this built in and your code would be doing the same thing except dumping cuda cache and trying to reload the pipeline every loop.
Args: prompt (str or List[str]): The input text.

JosephPai · 2024-12-13T00:57:40Z

@tavyra
According to this issue, it seems that this feature is not supported yet. (Sad...

feifeibear · 2024-12-13T12:32:09Z

The prompt list as inputs is not supported currently, whether a single GPU or multi-GPU with xDiT.

guankaisi · 2024-12-16T13:35:01Z

Hello, I have encountered the same problem as @JosephPai. Do you have a solution to fix the problem?

In issue Tencent#122 (Tencent#122), every for loop the parallelize_transformer will reset the pipeline causing the problem. If changing the parallelize_transfomer to __init__,it will solve the issue without affecting other functions.

guankaisi · 2024-12-16T15:08:52Z

I found that this problem is caused by reinitializing the parallelize_transformer function. I have solved this problem in #130.

Solve the issue #122 by updating inference.py

guankaisi mentioned this issue Dec 16, 2024

Solve the issue #122 by updating inference.py #130

Merged

JacobKong added a commit that referenced this issue Dec 18, 2024

Merge pull request #130 from guankaisi/main

eae6111

Solve the issue #122 by updating inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference with a list of prompts without re-loading the model each time #122

Inference with a list of prompts without re-loading the model each time #122

JosephPai commented Dec 12, 2024

tavyra commented Dec 12, 2024

JosephPai commented Dec 13, 2024

feifeibear commented Dec 13, 2024

guankaisi commented Dec 16, 2024

guankaisi commented Dec 16, 2024

Inference with a list of prompts without re-loading the model each time #122

Inference with a list of prompts without re-loading the model each time #122

Comments

JosephPai commented Dec 12, 2024

tavyra commented Dec 12, 2024

JosephPai commented Dec 13, 2024

feifeibear commented Dec 13, 2024

guankaisi commented Dec 16, 2024

guankaisi commented Dec 16, 2024