LightX2V icon indicating copy to clipboard operation
LightX2V copied to clipboard

[Feature Request] Implement batch inference for multiple prompts in single forward pass

Open FredyRivera-dev opened this issue 1 month ago • 1 comments

Description

I would like to request the implementation of batch inference in LightX2V, allowing multiple prompts to be processed in a single forward pass to improve horizontal scaling and performance.

Motivation

Currently, LightX2V processes individual prompts through the generate() method of LightX2VPipeline. To process multiple prompts, the current approach requires using multiple servers in parallel as shown in post_multi_servers_tv2.py .

Implementing batch inference would provide:

  • Better performance: Reduced overhead by processing multiple prompts in a single forward pass
  • Horizontal scalability: Easier processing of large volumes of requests
  • Resource optimization: Better utilization of GPU memory and compute

Current Behavior

  • LightX2VPipeline.generate() accepts individual parameters: seed, prompt, negative_prompt, etc.
  • Models like WanModel and HunyuanVideo15Model process with batch_size=1
  • The server handles tasks one by one to manage GPU memory effectively

Proposed Solution

  1. Extend pipeline interface: Modify generate() to accept lists of prompts
  2. Batch support in models: Add batch dimension in _infer_cond_uncond() methods
  3. Memory management: Adjust memory handling for larger batches
  4. Maintain compatibility: Preserve current API for individual use

Possible Implementations

# Proposed API
pipe.generate_batch(
    prompts=["prompt1", "prompt2", "prompt3"],
    negative_prompts=["neg1", "neg2", "neg3"],
    seeds=[42, 43, 44],
    save_result_paths=["out1.mp4", "out2.mp4", "out3.mp4"]
)

Expected Impact

  • Significant reduction in processing time for multiple videos
  • Better GPU resource utilization
  • Easier high-volume production deployments

FredyRivera-dev avatar Dec 28 '25 01:12 FredyRivera-dev

Note: I'm including the batch inference implementation from diffusers for your reference. I hope it helps! :D

Wan2.2: https://github.com/huggingface/diffusers/blob/262ce19b/src/diffusers/pipelines/wan/pipeline_wan.py (The important sections are lines 191 to 194, lines 384 to 392, lines 510 to 554, and lines 592 to 610)

HunyuanVideo 1.5: https://github.com/huggingface/diffusers/blob/262ce19b/src/diffusers/pipelines/hunyuan_video1_5/pipeline_hunyuan_video1_5.py (The important sections are, from line 397 to 407, lines 546 to 553, lines 664 to 669, lines 707 to 717 and lines 741 to 803)

FredyRivera-dev avatar Dec 28 '25 01:12 FredyRivera-dev