software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Notes on kickstarting the RISC-V software layer

Open bedroge opened this issue 1 year ago • 197 comments

With a compatibility layer (https://github.com/EESSI/compatibility-layer/pull/204) and software build container (https://github.com/EESSI/filesystem-layer/pull/132 and https://github.com/orgs/EESSI/packages/container/package/build-node) in place, we are ready to start working on a RISC-V software layer. In this issue we can keep track/notes of the work being done and issues that we encounter.

bedroge avatar Apr 23 '24 13:04 bedroge

The repository that we use is /cvmfs/riscv.eessi.io, added in https://github.com/EESSI/filesystem-layer/pull/181. The structure is the same as in /cvmfs/software.eessi.io.

For now we first focus on generic builds (added to easybuild in https://github.com/easybuilders/easybuild-framework/pull/4489). Flags for optimized builds are still lacking, see https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L82.

bedroge avatar Apr 23 '24 13:04 bedroge

In order to get EasyBuild installed, I've used the following:

singularity build --sandbox /nvme/build-container docker://ghcr.io/eessi/build-node:debian-sid
EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io ./eessi_container.sh -c /nvme/build-container --access rw
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/startprefix
git clone https://github.com/EESSI/software-layer
cd software-layer
wget https://github.com/EESSI/software-layer/pull/537.diff
export EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io EESSI_VERSION_OVERRIDE=20240402 EESSI_SOFTWARE_SUBDIR_OVERRIDE=riscv64/generic
./EESSI-install-software.sh

We explicitly override some variables to reflect the new repo/version/CPU target, and then it sort of mimics what the bot would do by taking the diff file from https://github.com/EESSI/software-layer/pull/537 and running the install script. This worked perfectly fine. :tada:

bedroge avatar Apr 23 '24 14:04 bedroge

Now EasyBuild is available in the repo, one could easily start trying to build additional software interactively:

# Launch the container
EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io ./eessi_container.sh -c docker://ghcr.io/eessi/build-node:debian-sid --access rw

# Start a prefix shell in the container:
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/startprefix

# EESSI init
export EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io EESSI_VERSION_OVERRIDE=20240402 EESSI_SOFTWARE_SUBDIR_OVERRIDE=riscv64/generic
source /cvmfs/riscv.eessi.io/versions/20240402/init/bash

# Set up EB and start a build
git clone https://github.com/EESSI/software-layer
cd software-layer
export WORKDIR=/tmp/eb
source configure_easybuild
module load EasyBuild
eb --optarch=GENERIC -r foss-2023b.eb

bedroge avatar Apr 23 '24 17:04 bedroge

As a first attempt, I tried building GCC 13.2.0, but that failed due to the hook that sets up a wrapper for ld. It uses config.guess to determine the system type, and this returns risc64-unknown-linux-gnu. It will then look for riscv64-unknown-linux-gnu-ld* in $EPREFIX/usr/bin, but Gentoo was built with CHOST = riscv64-pc-linux-gnu, so the binaries also use that in their filenames.

I've opened a PR at the Gentoo repo to change the CHOST: https://github.com/gentoo/gentoo/pull/36353.

Meanwhile I worked around the issue by hardcoding it in the hook to:

cmd_prefix = 'riscv64-pc-linux-gnu-'

Furthermore, ld.gold has to be removed in the next line for cmd in ('ld', 'ld.gold', 'ld.bfd'):, since we don't have ld.gold in our RISC-V compat layer.

With these small changes I could successfully build GCC 13.2.0 (not ingested yet).

bedroge avatar Apr 23 '24 17:04 bedroge

FFTW fails due to:

checking for sinq in -lquadmath... no
configure: error: quad precision requires libquadmath for quad-precision trigonometric routines

Looks like our GCC doesn't include libquadmath, I suppose it doesn't work on RISC-V (?). This Fedora page has a message enable support for riscv64, so maybe we need GCC 14. For now we could try building FFTW without it.

edit: I was checking the FFTW easyblock, and I found that this is already disabled for Arm and PowerPC, so we should make a PR to do the same for RISC-V: https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/f/fftw.py#L143

edit2: PR created: https://github.com/easybuilders/easybuild-easyblocks/pull/3314

bedroge avatar Apr 24 '24 07:04 bedroge

When trying to build foss 2023b, I ran into the next issue with UCX, which has an outdated config.guess:

checking build system type... ./config.guess: unable to guess system type

This script, last modified 2013-06-10, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from

  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
and
  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD

If the version you run (./config.guess) is already up to date, please
send the following data and any information you think might be
pertinent to <[email protected]> in order to provide the needed
information to handle your system.

config.guess timestamp = 2013-06-10

uname -m = riscv64
uname -r = 5.15.0-starfive
uname -s = Linux
uname -v = #1 SMP Fri Nov 24 07:22:28 UTC 2023

/usr/bin/uname -p = unknown
/bin/uname -X     = 

hostinfo               = 
/bin/universe          = 
/usr/bin/arch -k       = 
/bin/arch              = riscv64
/usr/bin/oslevel       = 
/usr/convex/getsysinfo = 

UNAME_MACHINE = riscv64
UNAME_RELEASE = 5.15.0-starfive
UNAME_SYSTEM  = Linux
UNAME_VERSION = #1 SMP Fri Nov 24 07:22:28 UTC 2023
configure: error: cannot guess build type; you must specify one

So we need to patch this by providing a newer version of config.guess before the configure step.

edit: I worked around the issue by using a hook that copies EB's config.guess to the UCX build dir:

        config_guess_path = self.obtain_config_guess()
        copy_file(config_guess_path, self.start_dir)

This allows the configure step to complete, but the build fails almost immediately due to:

/tmp/eb/easybuild/build/UCX/1.15.0/GCCcore-13.2.0/ucx-1.15.0/src/ucm/bistro/bistro.h:24:4: error: #error "Unsupported architecture"
   24 | #  error "Unsupported architecture"
      |    ^~~~~

edit2: looks like RISC-V support was added in UCX 1.16.0 (which was released 10 days ago).

bedroge avatar Apr 25 '24 07:04 bedroge

The config.guess issue would normally be solved by EB itself, but it's not happening for UCX, because that easyconfig is using a wrapper script around ./configure. This PR changes it, which should solve the issue: https://github.com/easybuilders/easybuild-easyconfigs/pull/20428.

I also have a patch that backports RISC-V support into UCX 1.15.0: https://github.com/easybuilders/easybuild-easyconfigs/pull/20429.

bedroge avatar Apr 26 '24 15:04 bedroge

Next issue: the foss 2023b toolchain has UCC 1.2.0, but RISC-V support was only added in 1.3.0: https://github.com/openucx/ucc/pull/829. The diff is quite small, so it should be easy to backport this to 1.2.0.

Edit: solved in PR https://github.com/easybuilders/easybuild-easyconfigs/pull/20432.

bedroge avatar Apr 28 '24 12:04 bedroge

BLIS 0.9.0 fails in the configure step:

configure: automatic configuration requested.
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: /tmp/eb-oaju2ohj/cc7gwuui.o: in function `main':
config_detect.c:(.text+0x2aa): undefined reference to `bli_cpuid_query_id'
collect2: error: ld returned 1 exit status
./configure: line 1212: ./auto-detect.x: No such file or directory
configure: hardware detection driver returned ''.
configure: checking configuration against contents of 'config_registry'.
configure: 'auto-detected configuration '' is NOT registered!
configure: 
configure: *** Cannot continue with unregistered configuration ''. ***
configure: 

There are some BLIS PRs related to adding RISC-V functionality, so I'll have a look at those.

bedroge avatar Apr 28 '24 20:04 bedroge

Backported RISC-V support to BLIS 0.9.0: https://github.com/easybuilders/easybuild-easyconfigs/pull/20468.

OpenBLAS also built without any issues, so we're getting really close to having a full foss/2023b toolchain.

bedroge avatar May 03 '24 13:05 bedroge

FlexiBLAS and ScaLAPACK also installed without issues, so we now have foss/2023b!

bedroge avatar May 03 '24 15:05 bedroge

R 4.3.3 is now available as well. It required some (small) changes in the easyblocks/easyconfigs of Mesa, LLVM, and Java. I'll open PRs for those and list them here.

RISC-V support for Java: https://github.com/easybuilders/easybuild-easyblocks/pull/3323 https://github.com/easybuilders/easybuild-easyconfigs/pull/20495

RISC-V support for Mesa: https://github.com/easybuilders/easybuild-easyblocks/pull/3324

RISC-V support for LLVM: https://github.com/easybuilders/easybuild-easyblocks/pull/3325

In order to replace the dependency on Java 11 by Java 21, I used the following hook:

def parse_hook_use_newer_java(ec, *args, **kwargs):
    if ec.name == 'R' and ec.version in ['4.3.3'] and get_cpu_family() == RISCV:
        deps = ec['dependencies']
        java_dep = None
        java_name, java_version = ('Java', '11')
        for idx, dep in enumerate(deps):
            if dep[0] == java_name and dep[1] == java_version:
                java_dep = dep
                break
        if java_dep:
            deps[idx] = ('Java', '21', '', SYSTEM)

bedroge avatar May 07 '24 14:05 bedroge

dlb (https://pm.bsc.es/dlb) built without issues. Attached is the corresponding tar file. eessi-20240402-software-linux-riscv64-generic-1715088854.tar.gz

julianmorillo avatar May 07 '24 14:05 julianmorillo

While trying to install GROMACS, I ran into issues with its dependency SciPy-bundle, some numpy tests fail:

FAILED core/tests/test_numeric.py::TestBoolCmp::test_float - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-4] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-2] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-1] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[1] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fp_noncontiguous[f] - AssertionError: 
===== 6 failed, 33239 passed, 943 skipped, 1303 deselected, 31 xfailed, 3 xpassed, 58 warnings in 1640.83s (0:27:20) =====

I found https://github.com/numpy/numpy/pull/25246 which disables most of these on RISC-V, so for now I've ignored the test failures. Now GROMACS itself is failing in the test step as well:

99% tests passed, 1 tests failed out of 91

Label Time Summary:
GTest              = 759.58 sec*proc (87 tests)
IntegrationTest    = 285.44 sec*proc (30 tests)
MpiTest            = 420.83 sec*proc (23 tests)
QuickGpuTest       =  83.55 sec*proc (20 tests)
SlowGpuTest        = 493.55 sec*proc (14 tests)
SlowTest           = 392.31 sec*proc (13 tests)
UnitTest           =  81.82 sec*proc (44 tests)

Total Test time (real) = 760.26 sec

The following tests FAILED:
          2 - GmxapiMpiTests (Failed)

Full output of the failing test:

starting mdrun 'Water and methane'
4 steps,      0.0 ps (continuing from step 2,      0.0 ps).
[starfive:369549:0:369549] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[starfive:369548:0:369558] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 369558) ====
 0  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(ucs_handle_error+0x1fc) [0x3f9edc8044]
 1  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x2111e) [0x3f9edc811e]
 2  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x21280) [0x3f9edc8280]
 3  linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3fac463800]
 4  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc) [0x3fab7c1b74]
 5  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc) [0x3fab7ba8cc]
 6  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(+0x19d38) [0x3fab105d38]
 7  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x6b0f4) [0x3faafe20f4]
 8  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0xb6da0) [0x3fab02dda0]
=================================
[starfive:369548] *** Process received signal ***
[starfive:369548] Signal: Segmentation fault (11)
[starfive:369548] Signal code:  (-6)
[starfive:369548] Failing at address: 0x3e80005a38c
[starfive:369548] [ 0] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3fac463800]
[starfive:369548] [ 1] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc)[0x3fab7c1b74]
[starfive:369548] [ 2] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc)[0x3fab7ba8cc]
[starfive:369548] [ 3] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(+0x19d38)[0x3fab105d38]
[starfive:369548] [ 4] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x6b0f4)[0x3faafe20f4]
[starfive:369548] [ 5] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0xb6da0)[0x3fab02dda0]
[starfive:369548] *** End of error message ***
==== backtrace (tid: 369549) ====
 0  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(ucs_handle_error+0x1fc) [0x3f7cb9c044]
 1  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x2111e) [0x3f7cb9c11e]
 2  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x21280) [0x3f7cb9c280]
 3  linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3f86236800]
 4  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc) [0x3f85594b74]
 5  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc) [0x3f8558d8cc]
 6  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(GOMP_parallel+0x38) [0x3f84ed19c4]
 7  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZNK18nonbonded_verlet_t23dispatchNonbondedKernelEN3gmx19InteractionLocalityERK19interaction_const_tRKNS0_12StepWorkloadEiNS0_8ArrayRefIKNS0_11BasicVectorIdEEEENS8_IdEESD_P6t_nrnb+0xd4) [0x3f8558e146]
 8  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x7e3556) [0x3f85ac7556]
 9  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z8do_forceP8_IO_FILEPK9t_commrecPK14gmx_multisim_tRK10t_inputrecRKN3gmx18MDModulesNotifiersEPNSA_3AwhEP10gmx_enfrotPNSA_10ImdSessionEP6pull_tlP6t_nrnbP13gmx_wallcyclePK14gmx_localtop_tPA3_KdNSA_19ArrayRefWithPaddingINSA_11BasicVectorIdEEEENSA_8ArrayRefISY_EEPK9history_tPNSA_16ForceBuffersViewEPA3_dPK9t_mdatomsP14gmx_enerdata_tNS10_IST_EEP10t_forcerecRKNSA_21MdrunScheduleWorkloadEPNSA_19VirtualSitesHandlerEPddP9gmx_edsamP24CpuPpLongRangeNonbondedsRK22DDBalanceRegionHandler+0xdf0) [0x3f85ac9970]
10  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx15LegacySimulator5do_mdEv+0x39da) [0x3f85bc1870]
11  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx8Mdrunner8mdrunnerEv+0x6e60) [0x3f85bebcb6]
12  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi11SessionImpl3runEv+0x18) [0x3f8621870e]
13  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi7Session3runEv+0xe) [0x3f86218854]
14  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x2ebd2]
15  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x30) [0x3f852b17da]
16  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing4Test3RunEv+0xc2) [0x3f852a26fa]
17  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8TestInfo3RunEv+0x11c) [0x3f852a2824]
18  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing9TestSuite3RunEv+0xbc) [0x3f852a28ea]
19  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x1fa) [0x3f852ab23e]
20  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8UnitTest3RunEv+0x52) [0x3f852a2a46]
21  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x26dbe]
22  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x27688) [0x3f84d71688]
23  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(__libc_start_main+0x74) [0x3f84d71730]
24  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x26fd8]
=================================
[starfive:369549] *** Process received signal ***
[starfive:369549] Signal: Segmentation fault (11)
[starfive:369549] Signal code:  (-6)
[starfive:369549] Failing at address: 0x3e80005a38d
[starfive:369549] [ 0] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3f86236800]
[starfive:369549] [ 1] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc)[0x3f85594b74]
[starfive:369549] [ 2] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc)[0x3f8558d8cc]
[starfive:369549] [ 3] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(GOMP_parallel+0x38)[0x3f84ed19c4]
[starfive:369549] [ 4] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZNK18nonbonded_verlet_t23dispatchNonbondedKernelEN3gmx19InteractionLocalityERK19interaction_const_tRKNS0_12StepWorkloadEiNS0_8ArrayRefIKNS0_11BasicVectorIdEEEENS8_IdEESD_P6t_nrnb+0xd4)[0x3f8558e146]
[starfive:369549] [ 5] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x7e3556)[0x3f85ac7556]
[starfive:369549] [ 6] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z8do_forceP8_IO_FILEPK9t_commrecPK14gmx_multisim_tRK10t_inputrecRKN3gmx18MDModulesNotifiersEPNSA_3AwhEP10gmx_enfrotPNSA_10ImdSessionEP6pull_tlP6t_nrnbP13gmx_wallcyclePK14gmx_localtop_tPA3_KdNSA_19ArrayRefWithPaddingINSA_11BasicVectorIdEEEENSA_8ArrayRefISY_EEPK9history_tPNSA_16ForceBuffersViewEPA3_dPK9t_mdatomsP14gmx_enerdata_tNS10_IST_EEP10t_forcerecRKNSA_21MdrunScheduleWorkloadEPNSA_19VirtualSitesHandlerEPddP9gmx_edsamP24CpuPpLongRangeNonbondedsRK22DDBalanceRegionHandler+0xdf0)[0x3f85ac9970]
[starfive:369549] [ 7] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx15LegacySimulator5do_mdEv+0x39da)[0x3f85bc1870]
[starfive:369549] [ 8] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx8Mdrunner8mdrunnerEv+0x6e60)[0x3f85bebcb6]
[starfive:369549] [ 9] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi11SessionImpl3runEv+0x18)[0x3f8621870e]
[starfive:369549] [10] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi7Session3runEv+0xe)[0x3f86218854]
[starfive:369549] [11] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x2ebd2]
[starfive:369549] [12] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x30)[0x3f852b17da]
[starfive:369549] [13] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing4Test3RunEv+0xc2)[0x3f852a26fa]
[starfive:369549] [14] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8TestInfo3RunEv+0x11c)[0x3f852a2824]
[starfive:369549] [15] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing9TestSuite3RunEv+0xbc)[0x3f852a28ea]
[starfive:369549] [16] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x1fa)[0x3f852ab23e]
[starfive:369549] [17] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8UnitTest3RunEv+0x52)[0x3f852a2a46]
[starfive:369549] [18] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x26dbe]
[starfive:369549] [19] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x27688)[0x3f84d71688]
[starfive:369549] [20] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(__libc_start_main+0x74)[0x3f84d71730]
[starfive:369549] [21] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x26fd8]
[starfive:369549] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node starfive exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

bedroge avatar May 14 '24 07:05 bedroge

@bedroge Could that be simply due to insufficient memory on your SiFive Unmatched board?

boegel avatar May 16 '24 09:05 boegel

@bedroge Could that be simply due to insufficient memory on your ~~SiFive Unmatched~~ Starfive VisionFive 2 board?

I don't know, didn't see any Killed / OOM messages.

I tried again, this time using the slightly modified easyconfig from https://github.com/easybuilders/easybuild-easyconfigs/pull/20522, and then it failed in the second iteration:

Reading file /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/api/gmxapi/cpp/tests/Testing/Temporary/GmxApiTest_RunnerChainedMD.tpr, VERSION 2024.1-EasyBuild_4.9.1 (single precision)

-------------------------------------------------------
Program:     gmxapi-mpi-test, version 2024.1-EasyBuild_4.9.1
Source file: src/gromacs/utility/keyvaluetreeserializer.cpp (line 302)
Function:    gmx::{anonymous}::ValueSerializer::deserialize(gmx::ISerializer*)::<lambda()>
MPI rank:    0 (out of 2)

Assertion failed:
Condition: iter != s_deserializers.end()
Unknown type tag for deserializization

I don't have a clue what that's about, so I just did another attempt, and then the installation completed successfully (all tests of all four iterations passed) 🎉 🤷‍♂️

bedroge avatar May 19 '24 13:05 bedroge

GMP easyconfigs have precise: True in toolchainopts, but that doesn't work on RISC-V: the EB framework sets -mno-recip in this case (see https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L66C22-L66C31), but that's not supported on RISC-V. Neither on Arm, so there it's overridden to some other flags: https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L77 But those are not available on RISC-V either. It doesn't seem like there's a good alternative, but @julianmorillo is going to check with a compiler expert. Meanwhile I tried building without precise: True, and that worked fine. Also the test step completed without issues.

Feedback from Julian:

already talked with the compiler guy, it looks like the flag we need is -fno-reciprocal-math : https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Optimize-Options.html#index-freciprocal-math I only have two concerns with it: first one is that it is generic, so not sure why they are not using it for Intel or for ARM instead of the specific ones (or even why the specifics ones exist). and secondly, -fno-reciprocal-math is the default behaviour, so no need to put it explicitly (unless they are using also -Ofast or -Ofast-math ?)

edit: fixed in https://github.com/easybuilders/easybuild-framework/pull/4576.

bedroge avatar May 21 '24 18:05 bedroge

With x264 I'm running into an outdated config.guess issue once again. Here the problem is that its configure script is apparently handcrafted, and hence it doesn't contain the string that Easybuild uses to determine if this was generated with Autoconf (see https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/generic/configuremake.py#L57). If that's not there, EB will not replace the config.guess with a newer one (see https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/generic/configuremake.py#L303). So we probably have to do that manually in the easyconfig or with a hook.

edit: the same hook that I used before works fine and allows the installation to complete:

def pre_configure_hook_x264(self, *args, **kwargs):
    if self.name == 'x264' and self.version in ['20231019'] and get_cpu_architecture() == RISCV64:
        config_guess_path = self.obtain_config_guess()
        copy_file(config_guess_path, self.start_dir)

edit2: Properly fixed in https://github.com/easybuilders/easybuild-easyconfigs/pull/20968.

bedroge avatar May 21 '24 18:05 bedroge

And almost the same happens with LAME: it looks like the configure_cmd_prefix (here: https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LAME/LAME-3.100-GCCcore-13.2.0.eb#L29) breaks the os.path.exists(configure_command) in the easyblock, which makes it fail to recognize that this actually is an Autoconf-generated configure script. Or is it because it's running autoreconf in preconfigopts? Either way, the config.guess still doesn't get updated, but the same hook works for this one as well.

edit: fixed in https://github.com/easybuilders/easybuild-easyconfigs/pull/20970.

bedroge avatar May 21 '24 18:05 bedroge

libdwarf-0.9.2 installed. This is the corresponding tar file to be ingested: eessi-20240402-software-linux-riscv64-generic-1716472182.tar.gz

julianmorillo avatar May 23 '24 14:05 julianmorillo

With x265 I ran into the following issue:

/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: encoder/CMakeFiles/encoder.dir/analysis.cpp.o: relocation R_RISCV_HI20 against `_ZN4x26510g_log2SizeE' can not be used when making a shared object; recompile with -fPIC
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: encoder/CMakeFiles/encoder.dir/search.cpp.o: relocation R_RISCV_HI20 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: encoder/CMakeFiles/encoder.dir/bitcost.cpp.o: relocation R_RISCV_HI20 against `a local symbol' can not be used when making a shared object; recompile with -fPIC

<SNIP, more of those....>

/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: common/CMakeFiles/common.dir/deblock.cpp.o: relocation R_RISCV_HI20 against `_ZN4x26515g_zscanToRasterE' can not be used when making a shared object; recompile with -fPIC
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: common/CMakeFiles/common.dir/scaler.cpp.o: relocation R_RISCV_HI20 against `_ZTVN4x26512ScalerFilterE' can not be used when making a shared object; recompile with -fPIC
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: unresolvable R_RISCV_CALL_PLT relocation against symbol `log@@GLIBC_2.27'
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: unresolvable R_RISCV_CALL_PLT relocation against symbol `__cxa_atexit@@GLIBC_2.27'
/tmp/eb-xqbykceq/tmp9gre4wcp/rpath_wrappers/ld_wrapper/ld: line 69: 234648 Segmentation fault      /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld "${CMD_ARGS[@]}"
collect2: error: ld returned 139 exit status

This can be solved by adding -DENABLE_PIC=ON to the configopts (found that in the Gentoo ebuild file: https://github.com/gentoo/gentoo/blob/master/media-libs/x265/x265-3.5-r2.ebuild).

edit: fixed in https://github.com/easybuilders/easybuild-easyconfigs/pull/20971.

bedroge avatar May 28 '24 14:05 bedroge

Boost-1.83.0-GCC-13.2.0 has already been installed (this is the last Extrae dependency). The corresponding TAR file can be downloaded here: https://b2drop.bsc.es/index.php/s/Q3rMCXGX4r4SePQ

julianmorillo avatar May 29 '24 14:05 julianmorillo

FFmpeg failed because of:

AR      libavcodec/libavcodec.a
HOSTLD  doc/print_options
LD      libavutil/libavutil.so.58
GENTEXI doc/avoptions_format.texi
GENTEXI doc/avoptions_codec.texi
HTML    doc/ffmpeg.html
makeinfo: error parsing ./doc/t2h.pm: Undefined subroutine &Texinfo::Config::set_from_init_file called at ./doc/t2h.pm line 24.
make: *** [doc/Makefile:70: doc/ffmpeg.html] Error 1
make: *** Waiting for unfinished jobs....

It looks like it needs texinfo for building the html pages, but this is not listed as dependency in the easyconfig. We do have texinfo in the compat layer, but version 7.1, and apparently that version has issues: https://groups.google.com/g/linux.debian.bugs.dist/c/1f_eeuQd_2U The compat layers of x86_64 and aarch64 have texinfo 7.0.3, which explains why we haven't seen the same issue there.

This should be fixed upstream by adding texinfo as dependency, or, preferably, adding--disable-htmlpages to the configopts (I've tested this and it allowed the installation to complete).

edit: done in https://github.com/easybuilders/easybuild-easyconfigs/pull/20686.

bedroge avatar May 30 '24 07:05 bedroge

Installation of Extrae is giving me this error:

checking for binutils... notfound
configure: libbfd library directory: /usr/lib/riscv64-linux-gnu
configure: Warning! Cannot find the libiberty library in the given binutils home. Please, make sure that the binutils packages is correctly installed. If you have installed the binutils package by hand from their source code, make sure that libiberty is installed. Some releases of the binutils package do not install the libibery even invoking make install. The library should be within the libiberty directory within the binutils source tree.
checking for bfd.h... no
configure: error: You have asked to gather call-site information through --with-unwind which must be translated using binutils, but either libbfd or libiberty are not found. Please make sure that the binutils-dev package is installed and specify where to find these libraries through --with-binutils. The latest source can be downloaded from http://www.gnu.org/software/binutils
 (at easybuild/tools/run.py:682 in parse_cmd_output)
== 2024-05-29 16:41:10,208 build_log.py:267 INFO ... (took 4 mins 22 secs)
== 2024-05-29 16:41:10,231 config.py:699 DEBUG software install path as specified by 'installpath' and 'subdir_software': /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software
== 2024-05-29 16:41:10,232 filetools.py:2013 INFO Removing lock /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/.locks/_cvmfs_riscv.eessi.io_versions_20240402_software_linux_riscv64_generic_software_Extrae_4.1.5-gompi-2023b.lock...
== 2024-05-29 16:41:10,235 filetools.py:383 INFO Path /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/.locks/_cvmfs_riscv.eessi.io_versions_20240402_software_linux_riscv64_generic_software_Extrae_4.1.5-gompi-2023b.lock successfully removed.
== 2024-05-29 16:41:10,236 filetools.py:2017 INFO Lock removed: /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/.locks/_cvmfs_riscv.eessi.io_versions_20240402_software_linux_riscv64_generic_software_Extrae_4.1.5-gompi-2023b.lock
== 2024-05-29 16:41:10,237 easyblock.py:4291 WARNING build failed (first 300 chars): cmd " ./configure --prefix=/cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/Extrae/4.1.5-gompi-2023b  --build=riscv64-unknown-linux-gnu  --host=riscv64-unknown-linux-gnu  --with-mpi=/cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/OpenMPI
== 2024-05-29 16:41:10,239 easyblock.py:328 INFO Closing log for application name Extrae version 4.1.5

I'm trying now to install binutils (although I thought it was already provided by the compat layer).

julianmorillo avatar May 30 '24 08:05 julianmorillo

@julianmorillo See https://github.com/EESSI/software-layer/pull/554#issuecomment-2099376096

You probably need the full hook in https://github.com/EESSI/software-layer/pull/554/commits/41149ac060b7580f2b15d3e04908ffabe207e046

ocaisa avatar May 30 '24 09:05 ocaisa

Thanks, @ocaisa!!
Yes, both the hook and a patch are needed. I have submitted a PR with such a patch: https://github.com/easybuilders/easybuild-easyconfigs/pull/20690

julianmorillo avatar May 30 '24 15:05 julianmorillo

@bedroge , could we add this hook https://github.com/EESSI/software-layer/commit/41149ac060b7580f2b15d3e04908ffabe207e046 to the riscv.eessi.io software-layer?

julianmorillo avatar May 30 '24 15:05 julianmorillo

@bedroge , could we add this hook 41149ac to the riscv.eessi.io software-layer?

The hooks file is being stored on github (it will be picked up by EasyBuild when doing the actual builds), so we just needs the PR from @boegel being merged in order to have it available. But feel free to already use it locally for your Extrae builds.

bedroge avatar May 31 '24 07:05 bedroge

I have just done a PR: https://github.com/easybuilders/easybuild-easyblocks/pull/3339 Regarding Extrae:

  • Removes configure options --enable-xml and --with-dwarf that are no longer available starting from 4.1.0 version
  • Adds --with-xml option as suggested by @bedroge
  • Adds --enable-posix-clock option for RISCV64 (needed to build as no lower level clock seems to be available)

julianmorillo avatar May 31 '24 14:05 julianmorillo

First failing tests of Extrae for RISC-V are:

make[4]: Leaving directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/launcher'
make[3]: Leaving directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/launcher'
Making check in tracer
make[3]: Entering directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer'
Making check in OTHER
make[4]: Entering directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
make  auto-init-fini define_event_type_gen_pcf define_event_type_gen_pcf_f
make[5]: Entering directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
  CC       auto_init_fini-auto-init-fini.o
  CCLD     auto-init-fini
  CC       define_event_type_gen_pcf-define_event_type_gen_pcf.o
  CCLD     define_event_type_gen_pcf
  FC       ../../../../include/define_event_type_gen_pcf_f-extrae_module.o
  FC       define_event_type_gen_pcf_f-define_event_type_gen_pcf.o
  FCLD     define_event_type_gen_pcf_f
make[5]: Leaving directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
make  check-TESTS
make[5]: Entering directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
make[6]: Entering directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
FAIL: auto-init-fini.sh
FAIL: define_event_type_gen_pcf.sh
FAIL: define_event_type_gen_pcf_f.sh
============================================================================
Testsuite summary for Extrae 4.1.6
============================================================================
# TOTAL: 3
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  3
# XPASS: 0
# ERROR: 0
============================================================================
See tests/functional/tracer/OTHER/test-suite.log
Please report to [email protected]
============================================================================
make[6]: *** [Makefile:1232: test-suite.log] Error 1
make[6]: Leaving directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
make[5]: *** [Makefile:1340: check-TESTS] Error 2
make[5]: Leaving directory '/tmp/eb/easybuild/build/Extrae/4.1.6/gompi-2023b/extrae-4.1.6/tests/functional/tracer/OTHER'
make[4]: *** [Makefile:1427: check-am] Error 2
make[4]: Target 'check' not remade because of errors.

julianmorillo avatar May 31 '24 15:05 julianmorillo