Merlin icon indicating copy to clipboard operation
Merlin copied to clipboard

[RMP] Migrate CI infrastructure from Jenkins to Github Actions

Open jperez999 opened this issue 3 years ago • 0 comments

Problem:

Merlin's current CI infrastructure for PR level testing is supported with a P100 DGX station. That machine is running a docker container with a jenkins instance. This Jenkins instance is responsible for running the PR level unit test runs for all the repos. The issue with the current setup is that all unit tests for all repos and PRs run on the same container. This causes issues when each job changes the environment. This can/has caused problems in the past with test runs happening in parallel. This would not be an issue if we use github actions. Github Actions spawns a new container (based on an image) for every test run. Isolating each test allowing for more stable testing.

Goal:

  • Allow running unit tests of multiple repos (per project testing) i.e. for merlin systems, test merlin systems unit tests and merlin unit tests.

Constraints:

  • Blossom is our internal testing solution but it currently falls short because it does not have multiple GPU and you need to activate runs using push notifications.
  • Security wise the jenkins instance and github actions will have same security strategy (authorized group of users whitelisted, other require permission to run PRs) and both versions use the same resources (DGX Station)

Starting Point:

  • [ ] Install github actions Runner on DGX Station
  • [ ] Load necessary images on DGX Station
  • [ ] Hook up github actions runner to Nvidia merlin github project
  • [ ] Setup workflow for using github actions runner on repos
  • [ ] Verify that the workflows run the correct set of tests

jperez999 avatar Jun 22 '22 00:06 jperez999