cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

CI: Refactor test pipelines to cover bare VM and container environments

Open leofang opened this issue 1 month ago • 0 comments

Using #1311 as the playground, as of commit 4011bb8e54b57b6138ce8da809bca606be4b9b21 and CI logs at https://github.com/NVIDIA/cuda-python/actions/runs/19951181011 I verified that nv-gha-runners no longer makes containers as a hard requirement for running jobs on GPU runners. We can now run GPU jobs just fine on the bare, ephemeral VM. This would help us accelerate job start time.

The current test blocker is #1307. We recently added xfail to tests we did not think runnable in the CI. But those tests did run in the bare VM setup, and turned xfail to xpass (hence failing, because we set the strict mode). This can be easily fixed.

In the internal discussion we concluded that we don't need to test against a set of containers. But it is nice to test both container and containerless (i.e. bare VM) environments. We currently have two test workflows:

  • test-wheel-linux.yml: Needed because Linux runners required a container (no longer needed)
  • test-wheel-windows.yml: Needed because Windows runners do not require any container

I suggest we rename and re-purposes the two workflows as follows:

  • test-wheel-container.yml: This runs all existing Linux tests
  • test-wheel-containerless.yml: This runs all existing Linux + Windows tests
    • Piggybacking on this refactoring we can probably get rid of the Powershell usage in the workflow, since our CI relies heavily on Bash and Git Bash.

leofang avatar Dec 14 '25 21:12 leofang