[Feature Request] Implement batch inference for multiple prompts in single forward pass

Open FredyRivera-dev opened this issue 1 month ago • 1 comments

Description

I would like to request the implementation of batch inference in LightX2V, allowing multiple prompts to be processed in a single forward pass to improve horizontal scaling and performance.

Motivation

Currently, LightX2V processes individual prompts through the generate() method of LightX2VPipeline. To process multiple prompts, the current approach requires using multiple servers in parallel as shown in post_multi_servers_tv2.py .

Implementing batch inference would provide:

Better performance: Reduced overhead by processing multiple prompts in a single forward pass
Horizontal scalability: Easier processing of large volumes of requests
Resource optimization: Better utilization of GPU memory and compute

Current Behavior

LightX2VPipeline.generate() accepts individual parameters: seed, prompt, negative_prompt, etc.
Models like WanModel and HunyuanVideo15Model process with batch_size=1
The server handles tasks one by one to manage GPU memory effectively

Proposed Solution

Extend pipeline interface: Modify generate() to accept lists of prompts
Batch support in models: Add batch dimension in _infer_cond_uncond() methods
Memory management: Adjust memory handling for larger batches
Maintain compatibility: Preserve current API for individual use

Possible Implementations

# Proposed API
pipe.generate_batch(
    prompts=["prompt1", "prompt2", "prompt3"],
    negative_prompts=["neg1", "neg2", "neg3"],
    seeds=[42, 43, 44],
    save_result_paths=["out1.mp4", "out2.mp4", "out3.mp4"]
)

Expected Impact

Significant reduction in processing time for multiple videos
Better GPU resource utilization
Easier high-volume production deployments

Dec 28 '25 01:12 FredyRivera-dev

Note: I'm including the batch inference implementation from diffusers for your reference. I hope it helps! :D

Wan2.2: https://github.com/huggingface/diffusers/blob/262ce19b/src/diffusers/pipelines/wan/pipeline_wan.py (The important sections are lines 191 to 194, lines 384 to 392, lines 510 to 554, and lines 592 to 610)

HunyuanVideo 1.5: https://github.com/huggingface/diffusers/blob/262ce19b/src/diffusers/pipelines/hunyuan_video1_5/pipeline_hunyuan_video1_5.py (The important sections are, from line 397 to 407, lines 546 to 553, lines 664 to 669, lines 707 to 717 and lines 741 to 803)

Dec 28 '25 01:12 FredyRivera-dev