diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[CI] add a big GPU marker to run memory-intensive tests separately on CI

Open sayakpaul opened this issue 1 year ago • 6 comments

What does this PR do?

I have only touched a handful of tests with the marker being introduced. I think we may need to change the slices based on the CI machine and infra. @a-r-r-o-w should consider marking the Cog tests similarly as well?

@DN6 would love to get your thoughts on the design.

sayakpaul avatar Oct 16 '24 07:10 sayakpaul

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

should consider marking the Cog tests similarly as well?

With model cpu offload and vae tiling, it should be < 16 GB, and I think we documented it here. Are we seeing Cog test failures due to memory? I see that they are passing here

a-r-r-o-w avatar Oct 16 '24 11:10 a-r-r-o-w

Ah okay then. No issues.

sayakpaul avatar Oct 16 '24 11:10 sayakpaul

@DN6 okay if I modified the failing tests to account for the machine change?

sayakpaul avatar Oct 16 '24 13:10 sayakpaul

@DN6 can you give this a look? I think the test failures should go away once the CI Bot has access to Flux.

Once approved I will revert the changes which I have denoted as temporary (like this).

sayakpaul avatar Oct 17 '24 10:10 sayakpaul

@DN6 regarding https://github.com/huggingface/diffusers/actions/runs/11398910357/job/31716739483?pr=9691#step:7:67, my hunch is that there's some kind of leakage happening which is causing the worker to crash. When I SSH'd into the runner and manually ran the test, it passed.

sayakpaul avatar Oct 18 '24 07:10 sayakpaul

In a follow-up I will introduce the quantization tests.

sayakpaul avatar Oct 31 '24 13:10 sayakpaul