[CI] add a big GPU marker to run memory-intensive tests separately on CI
What does this PR do?
I have only touched a handful of tests with the marker being introduced. I think we may need to change the slices based on the CI machine and infra. @a-r-r-o-w should consider marking the Cog tests similarly as well?
@DN6 would love to get your thoughts on the design.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
should consider marking the Cog tests similarly as well?
With model cpu offload and vae tiling, it should be < 16 GB, and I think we documented it here. Are we seeing Cog test failures due to memory? I see that they are passing here
Ah okay then. No issues.
@DN6 okay if I modified the failing tests to account for the machine change?
@DN6 can you give this a look? I think the test failures should go away once the CI Bot has access to Flux.
Once approved I will revert the changes which I have denoted as temporary (like this).
@DN6 regarding https://github.com/huggingface/diffusers/actions/runs/11398910357/job/31716739483?pr=9691#step:7:67, my hunch is that there's some kind of leakage happening which is causing the worker to crash. When I SSH'd into the runner and manually ran the test, it passed.
In a follow-up I will introduce the quantization tests.