software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

{2023.06}[2023a,a64fx] apps originally built with EB 4.9.2

Open trz42 opened this issue 8 months ago • 5 comments

Includes all apps in this batch. Build time on NVIDIA Grace was ~ 16 hours. Might need to split this up and limit build parallelism for some packages.

trz42 avatar May 27 '25 08:05 trz42

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

NOTE, bot code wasn't updated on Deucalion, therefore it created this comment.

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

trz42 avatar May 27 '25 08:05 trz42

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.05/pr_1091/437784

date job status comment
May 27 08:16:06 UTC 2025 submitted job id 437784 awaits release by job manager
May 27 08:17:00 UTC 2025 released job awaits launch by Slurm scheduler
May 27 08:18:04 UTC 2025 running job 437784 is running
May 27 08:31:43 UTC 2025 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-437784.out
:white_check_mark: no message matching FATAL:
:x: found message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 27 08:31:43 UTC 2025 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/9) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] (2/9) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] (3/9) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] (4/9) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ OK ] (5/9) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 580.081 timesteps/s (r:0, l:None, u:None)
[ OK ] (6/9) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:aarch64_a64fx+default
P: latency: 1.68 us (r:0, l:None, u:None)
[ OK ] (7/9) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_a64fx+default
P: latency: 1.72 us (r:0, l:None, u:None)
[ OK ] (8/9) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8794.13 MB/s (r:0, l:None, u:None)
[ OK ] (9/9) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8682.74 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/9 test case(s) from 9 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-437784.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

Try again... bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

trz42 avatar May 27 '25 12:05 trz42

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.05/pr_1091/437986

  • job has been restarted three times ... we shouldn't let the queueing system do that automatically as it might overwrite job output and limit our ability to debug issues
date job status comment
May 27 12:28:41 UTC 2025 submitted job id 437986 awaits release by job manager
May 27 12:28:59 UTC 2025 released job awaits launch by Slurm scheduler
May 27 12:30:01 UTC 2025 running job 437986 is running
May 27 19:29:04 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job437986.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
May 27 19:29:04 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job437986.test does not exist in job directory, or parsing it failed.

Rerun with additional argument --no-requeue to prevent automatic restarts... bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

trz42 avatar May 27 '25 19:05 trz42

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.05/pr_1091/438489

date job status comment
May 27 19:32:14 UTC 2025 submitted job id 438489 awaits release by job manager
May 27 19:33:08 UTC 2025 released job awaits launch by Slurm scheduler
May 27 19:34:11 UTC 2025 running job 438489 is running
May 27 21:21:33 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job438489.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
May 27 21:21:33 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job438489.test does not exist in job directory, or parsing it failed.

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

boegel avatar Jun 15 '25 14:06 boegel

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.06/pr_1091/461123

date job status comment
Jun 15 14:46:09 UTC 2025 submitted job id 461123 awaits release by job manager
Jun 15 14:46:34 UTC 2025 released job awaits launch by Slurm scheduler
Jun 15 14:47:37 UTC 2025 running job 461123 is running
Jun 15 16:40:18 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job461123.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jun 15 16:40:18 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job461123.test does not exist in job directory, or parsing it failed.

Looks like job 461123 failed prematurely for no good reason? Last completed installation was MetalWalls/21.06.1-foss-2023a, maybe it got killed while trying to install QuantumESPRESSO-7.3.1-foss-2023a.eb?

boegel avatar Jun 17 '25 09:06 boegel

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

boegel avatar Jun 17 '25 09:06 boegel

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.06/pr_1091/463781

date job status comment
Jun 17 09:08:56 UTC 2025 submitted job id 463781 awaits release by job manager
Jun 17 09:09:27 UTC 2025 released job awaits launch by Slurm scheduler
Jun 17 09:13:23 UTC 2025 running job 463781 is running
Jun 17 11:00:39 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job463781.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jun 17 11:00:39 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job463781.test does not exist in job directory, or parsing it failed.

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

trz42 avatar Jun 17 '25 14:06 trz42

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.06/pr_1091/464151

date job status comment
Jun 17 14:02:58 UTC 2025 submitted job id 464151 awaits release by job manager
Jun 17 14:03:35 UTC 2025 released job awaits launch by Slurm scheduler
Jun 17 14:04:38 UTC 2025 running job 464151 is running
Jun 17 18:23:02 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job464151.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jun 17 18:23:02 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job464151.test does not exist in job directory, or parsing it failed.

I think we need to change tactics a bit when building for A64FX, since by default there's less than 1GB per core on Deucalion A64FX partition:

  • https://github.com/EESSI/software-layer-scripts/pull/17

boegel avatar Jun 17 '25 18:06 boegel

Let's try this again now that https://github.com/EESSI/software-layer-scripts/pull/17 is merged...

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

boegel avatar Jun 20 '25 10:06 boegel

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

boegel avatar Jun 20 '25 12:06 boegel

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.06/pr_1091/468821

date job status comment
Jun 20 12:29:53 UTC 2025 submitted job id 468821 awaits release by job manager
Jun 20 12:30:06 UTC 2025 released job awaits launch by Slurm scheduler
Jun 20 12:31:09 UTC 2025 running job 468821 is running
Jun 20 14:31:09 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job468821.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jun 20 14:31:09 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job468821.test does not exist in job directory, or parsing it failed.

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

ocaisa avatar Jul 31 '25 12:07 ocaisa

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.07/pr_1091/505437

date job status comment
Jul 31 12:15:16 UTC 2025 submitted job id 505437 awaits release by job manager
Jul 31 12:15:40 UTC 2025 released job awaits launch by Slurm scheduler
Jul 31 12:16:52 UTC 2025 running job 505437 is running
Jul 31 14:16:00 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job505437.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jul 31 14:16:00 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job505437.test does not exist in job directory, or parsing it failed.

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

hvelab avatar Aug 13 '25 14:08 hvelab

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.08/pr_1091/513102

date job status comment
Aug 13 14:17:48 UTC 2025 submitted job id 513102 awaits release by job manager
Aug 13 14:18:40 UTC 2025 released job awaits launch by Slurm scheduler
Aug 13 14:19:44 UTC 2025 running job 513102 is running
Aug 13 16:20:12 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job513102.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Aug 13 16:20:12 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job513102.test does not exist in job directory, or parsing it failed.

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx

bedroge avatar Oct 07 '25 12:10 bedroge

New job on instance eessi-bot-deucalion for repository eessi.io-2023.06-software Building on: a64fx Building for: aarch64/a64fx Job dir: /home/eessibot/new-bot/jobs/2025.10/pr_1091/579180

date job status comment
Oct 07 12:08:07 UTC 2025 submitted job id 579180 awaits release by job manager
Oct 07 12:08:33 UTC 2025 released job awaits launch by Slurm scheduler
Oct 07 12:09:37 UTC 2025 running job 579180 is running
Oct 07 14:00:54 UTC 2025 finished
:shrug: UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job579180.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Oct 07 14:00:54 UTC 2025 test result
:shrug: UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job579180.test does not exist in job directory, or parsing it failed.

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx

bedroge avatar Oct 07 '25 18:10 bedroge

New job on instance eessi-bot-deucalion for repository eessi.io-2023.06-software Building on: a64fx Building for: aarch64/a64fx Job dir: /home/eessibot/new-bot/jobs/2025.10/pr_1091/579534

date job status comment
Oct 07 18:02:16 UTC 2025 submitted job id 579534 awaits release by job manager
Oct 07 18:02:46 UTC 2025 released job awaits launch by Slurm scheduler
Oct 07 18:03:50 UTC 2025 running job 579534 is running
Oct 07 19:57:21 UTC 2025 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-579534.out
:white_check_mark: no message matching FATAL:
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-17598666520.tar.gzsize: 54 MiB (56973001 bytes)
entries: 2333
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
BCFtools/1.18-GCC-12.3.0.lua
CapnProto/1.0.1-GCCcore-12.3.0.lua
DendroPy/4.6.1-GCCcore-12.3.0.lua
HMMER/3.4-gompi-2023a.lua
HTSlib/1.18-GCC-12.3.0.lua
IQ-TREE/2.3.5-gompi-2023a.lua
KronaTools/2.8.1-GCCcore-12.3.0.lua
LSD2/2.4.1-GCCcore-12.3.0.lua
MAFFT/7.520-GCC-12.3.0-with-extensions.lua
Meson/1.3.1-GCCcore-12.3.0.lua
MetalWalls/21.06.1-foss-2023a.lua
f90wrap/0.2.13-foss-2023a.lua
fastp/0.23.4-GCC-12.3.0.lua
meson-python/0.15.0-GCCcore-12.3.0.lua
ncbi-vdb/3.0.10-gompi-2023a.lua
software under 2023.06/software/linux/aarch64/a64fx/software
BCFtools/1.18-GCC-12.3.0
CapnProto/1.0.1-GCCcore-12.3.0
DendroPy/4.6.1-GCCcore-12.3.0
HMMER/3.4-gompi-2023a
HTSlib/1.18-GCC-12.3.0
IQ-TREE/2.3.5-gompi-2023a
KronaTools/2.8.1-GCCcore-12.3.0
LSD2/2.4.1-GCCcore-12.3.0
MAFFT/7.520-GCC-12.3.0-with-extensions
Meson/1.3.1-GCCcore-12.3.0
MetalWalls/21.06.1-foss-2023a
f90wrap/0.2.13-foss-2023a
fastp/0.23.4-GCC-12.3.0
meson-python/0.15.0-GCCcore-12.3.0
ncbi-vdb/3.0.10-gompi-2023a
reprod directories under 2023.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/a64fx
no other files in tarball
Oct 07 19:57:21 UTC 2025 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 4/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ OK ] ( 5/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:aarch64_a64fx+default
P: perf: 583.284 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 585.425 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:aarch64_a64fx+default
P: latency: 1.67 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_a64fx+default
P: latency: 1.7 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8122.26 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8494.92 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/10 test case(s) from 10 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-579534.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
Oct 07 20:38:41 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-17598666520.tar.gz to S3 bucket succeeded

Staging PR merged, tarball ingested.

bedroge avatar Oct 07 '25 20:10 bedroge