software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Only run OSU test for now in test step.

Open casparvl opened this issue 1 year ago • 10 comments

As the test suite is growing, the test step would start taking longer and longer. This is particularly annoying if you want to do small tweaks during a build.

In the (near) future, we should make some mapping to determine which tests to run for which software installations. E.g. if your tarball contains a new TensorFlow module, you probably want to run the TensorFlow test (-n TensorFlow). If your tarball contains OpenMPI, you probably want to run OSU and maybe one MPI-based application (i.e. -n OSU -n GROMACS.*foss for example).

casparvl avatar May 14 '24 09:05 casparvl

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar May 14 '24 09:05 eessi-bot[bot]

Instance eessi-bot-mc-azure is configured to build:

  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar May 14 '24 09:05 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512

casparvl avatar May 14 '24 09:05 casparvl

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/skylake_avx512 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/skylake_avx512 resulted in:

    • submitted job 10734, for details & status see https://github.com/EESSI/software-layer/pull/571#issuecomment-2109766224

eessi-bot[bot] avatar May 14 '24 09:05 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • account casparvl has NO permission to send commands to the bot

eessi-bot[bot] avatar May 14 '24 09:05 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture x86_64-intel-skylake_avx512 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_571/10734

date job status comment
May 14 09:49:55 UTC 2024 submitted job id 10734 awaits release by job manager
May 14 09:50:38 UTC 2024 released job awaits launch by Slurm scheduler
May 14 09:55:40 UTC 2024 running job 10734 is running
May 14 09:56:41 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-10734.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 14 09:56:41 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 8/8 test case(s) from 8 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-10734.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar May 14 '24 09:05 eessi-bot[bot]

Test step from the SLURM log:

[==========] Running 8 check(s)
[==========] Started on Tue May 14 09:55:32 2024

[----------] start processing checks
[ RUN      ] EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:def
ault+default
[ RUN      ] EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:d
efault+default
[ RUN      ] EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:defa
ult+default
[ RUN      ] EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:de
fault+default
[ RUN      ] EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:default+d
efault
[ RUN      ] EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:default
+default
[ RUN      ] EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:default+defaul
t
[ RUN      ] EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:default+defa
ult
[       OK ] (1/8) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTes
ts:default+default
P: latency: 5.6 us (r:0, l:None, u:None)
[       OK ] (2/8) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildT
ests:default+default
P: latency: 3.54 us (r:0, l:None, u:None)
[       OK ] (3/8) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:default+default
P: latency: 8.67 us (r:0, l:None, u:None)
[       OK ] (4/8) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:default+default
P: latency: 8.22 us (r:0, l:None, u:None)
[       OK ] (5/8) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:default+default
P: latency: 0.45 us (r:0, l:None, u:None)
[       OK ] (6/8) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:d
efault+default
P: latency: 0.43 us (r:0, l:None, u:None)
[       OK ] (7/8) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:default+
default
P: bandwidth: 10778.78 MB/s (r:0, l:None, u:None)
[       OK ] (8/8) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:defaul
t+default
P: bandwidth: 10710.03 MB/s (r:0, l:None, u:None)
[----------] all spawned checks have finished

[  PASSED  ] Ran 8/8 test case(s) from 8 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Tue May 14 09:56:01 2024

That's exactly what I intended it to look like. It's also much faster now (30s), which is good for a default test step.

Maybe in the long run we should be able to tell the bot to test, or not to test. That way, if we are still debugging builds, we don't have to run the test step (yet).

casparvl avatar May 14 '24 10:05 casparvl

Something is going wrong, I see a CUDA and PSM2 as missing installations

ocaisa avatar May 14 '24 10:05 ocaisa

PSM2 is coming from https://github.com/easybuilders/easybuild-easyconfigs/pull/20501 which has a build dependency on CUDA (https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/p/PSM2/PSM2-12.0.1-GCCcore-12.2.0.eb#L26)

ocaisa avatar May 14 '24 11:05 ocaisa

@casparvl This now requires a sync with the default branch for CI to pass

ocaisa avatar May 16 '24 07:05 ocaisa

I'm not convinced we should go ahead and merge this, seems like something we'll easily forget to revert later, and the time needed to run the test suite currently isn't limiting at all imho...

boegel avatar Jun 24 '24 20:06 boegel

No longer needed, we now have test selection possibilities in https://github.com/EESSI/software-layer/pull/673

casparvl avatar Aug 19 '24 15:08 casparvl