diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Unable to Retrieve Intermediate Gradients with CogVideoXPipeline

Open lovelyczli opened this issue 1 year ago • 3 comments

Describe the bug

When generating videos using the CogVideoXPipeline model, we need to access the gradients of intermediate tensors. However, we do not require additional training or parameter updates for the model.

We tried using register_forward_hook to capture the gradients, but this approach failed because the CogVideoXPipeline disables gradient calculations. Specifically, in pipelines/cogvideo/pipeline_cogvideox.py at line 478, gradient tracking is turned off with @torch.no_grad().

How can we resolve this issue and retrieve the gradients without modifying the model’s parameters or performing extra training?

Reproduction

Sample Code pipe = CogVideoXPipeline.from_pretrained( "THUDM/CogVideoX-2b", torch_dtype=torch.float16 ) video = pipe( prompt=prompt, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, generator=torch.Generator(device="cuda").manual_seed(42), ).frames[0]

Pipeline Code Reference pipelines/cogvideo/pipeline_cogvideox.py at line 478 @torch.no_grad() @replace_example_docstring(EXAMPLE_DOC_STRING) def call( self, prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None, height: int = 480, width: int = 720,

Logs

No response

System Info

Diffusers version: 0.30.3

Who can help?

No response

lovelyczli avatar Oct 17 '24 04:10 lovelyczli

The pipelines should not be used for training. They are only meant for inference purposes, so gradient tracking cannot be done unless you modify the code to suit your needs. Instead, you will have to use each modeling component and write the training loop. You can see an example of training here

a-r-r-o-w avatar Oct 17 '24 07:10 a-r-r-o-w

@a-r-r-o-w Thank you for your prompt reply and the training code. I noticed that the provided training code requires independent modules, including T5EncoderModel, CogVideoXTransformer3DModel, and AutoencoderKLCogVideoX.

This approach seems somewhat cumbersome, as our requirement does not involve training or updating model parameters—we only need to access the gradients.

Would simply removing the torch.no_grad() decorator from lines 478-485 in the local pipeline_cogvideox.py resolve the issue efficiently?

Thank you very much!

lovelyczli avatar Oct 17 '24 08:10 lovelyczli

Yes, removing the torch.no_grad() would make it possible to access gradients. The models, by default, are in .eval() mode so layers like dropout will not take effect.

a-r-r-o-w avatar Oct 17 '24 08:10 a-r-r-o-w

Hi @lovelyczli, I believe this should be answered with the above comment, so am marking this as closed. Please feel free to re-open if there's anything else we can help with

a-r-r-o-w avatar Oct 27 '24 10:10 a-r-r-o-w