software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Weird behaviour while loading EESSIE with multiples modulefiles

Open remyd1 opened this issue 10 months ago • 11 comments

Hi,

I have a very strange behaviour while trying to load EESSIE (source /cvmfs/software.eessi.io/versions/2023.06/init/lmod/bash):

stack traceback:
	[C]: in function 'next'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: in function 'l_readCacheFile'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:561: in function 'build'
	...mpat/linux/x86_64/usr/share/Lmod/libexec/ModuleA.lua:685: in function 'singleton'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:183: in function 'l_lazyEval'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:262: in function 'sn'
	...6/compat/linux/x86_64/usr/share/Lmod/libexec/Hub.lua:312: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1055: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1031: in function 'load_usr'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:550: in function 'l_usrLoad'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:578: in function 'Load_Usr'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:711: in function 'Reset'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:867: in function 'cmd'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:514: in function 'main'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:585: in main chunk
	[C]: ?

Basically, we have a lot of modulefiles elsewhere, which are overwritten by sourcing this file (this should be in another issue).

module --version

Modules based on Lua: Version 8.7.23  2023-03-29 17:19 -05:00
    by Robert McLay [email protected]

We are using OpenHPC with Rocky Linux. /etc/os-release

NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

Any idea would be helpful.

remyd1 avatar Apr 08 '25 11:04 remyd1

Before you run the source command, which modules are loaded? Can you share the output of module list?

boegel avatar Apr 08 '25 12:04 boegel

Hi @boegel

[dernatr@io-login-02 ~]$ module list
No modules loaded
[dernatr@io-login-02 ~]$ type module
module is a function
module () 
{ 
    if [ -z "${LMOD_SH_DBG_ON+x}" ]; then
        case "$-" in 
            *v*x*)
                __lmod_sh_dbg='vx'
            ;;
            *v*)
                __lmod_sh_dbg='v'
            ;;
            *x*)
                __lmod_sh_dbg='x'
            ;;
        esac;
    fi;
    if [ -n "${__lmod_sh_dbg:-}" ]; then
        set +$__lmod_sh_dbg;
        echo "Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for Lmod's output" 1>&2;
    fi;
    eval "$($LMOD_CMD shell "$@")" && eval "$(${LMOD_SETTARG_CMD:-:} -s sh)";
    __lmod_my_status=$?;
    if [ -n "${__lmod_sh_dbg:-}" ]; then
        echo "Shell debugging restarted" 1>&2;
        set -$__lmod_sh_dbg;
    fi;
    unset __lmod_sh_dbg;
    return $__lmod_my_status
}

Thank you !

remyd1 avatar Apr 08 '25 12:04 remyd1

If it can be helpful:

[root@io-login-02 ~]# rpm -qa |grep -E "lua|lmod"
lua-libs-5.4.4-4.el9.x86_64
lua-srpm-macros-1-6.el9.noarch
lua-posix-35.0-8.el9.x86_64
lua-5.4.4-4.el9.x86_64
lua-filesystem-1.8.0-5.el9.x86_64
lmod-ohpc-8.7.53-320.ohpc.3.1.x86_64
mod_lua-2.4.57-11.el9_4.1.x86_64

remyd1 avatar Apr 08 '25 12:04 remyd1

Can you also share the output of ml --config? It seems like there's a problem with reading an Lmod spider cache (though it's unclear which one).

In the mean time, you can use the other way of setting up the EESSI environment, by source the non-Lmod init script instead:

source /cvmfs/software.eessi.io/versions/2023.06/init/bash

boegel avatar Apr 08 '25 12:04 boegel

Thanks for your answer @boegel !

Indeed, if I clear the cache, it seems to work fine (with sourcing the non lmod file).

Output of ml --config

Modules based on Lua: Version 8.7.53 2024-10-12 19:57 -05:00
    by Robert McLay [email protected]

Description                                                   Value
-----------                                                   -----
Allow root to use Lmod (LMOD_ALLOW_ROOT_USE)                  yes
Allow TCL modulefiles (LMOD_ALLOW_TCL_FILES)                  yes
Auto swapping (LMOD_AUTO_SWAP)                                no
Avail Style (LMOD_AVAIL_STYLE)                                <system>
Case Independent Sorting (LMOD_CASE_INDEPENDENT_SORTING)      no
Colorize Lmod (LMOD_COLORIZE)                                 no
Configuration dir (LMOD_CONFIG_DIR)                           /etc/lmod
Disable Same Name AutoSwap (LMOD_DISABLE_SAME_NAME_AUTOSWAP)  no
Display Extension w/ avail (LMOD_AVAIL_EXTENSIONS)            yes
Use ~/.config dir only (LMOD_USE_DOT_CONFIG_ONLY)             no
Downstream Module Conflicts (LMOD_DOWNSTREAM_CONFLICTS)       no
Allow duplicate paths (LMOD_DUPLICATE_PATHS)                  no
Dynamic Spider Cache (LMOD_DYNAMIC_SPIDER_CACHE)              yes
Require Exact Match/no defaults (LMOD_EXACT_MATCH)            no
Export the module command (LMOD_EXPORT_MODULE)                yes
Allow extended default (LMOD_EXTENDED_DEFAULT)                yes
Use attached TCL over system call (LMOD_FAST_TCL_INTERP)      yes
Is fast TCL interp available (LMOD_USING_FAST_TCL_INTERP)     yes
File ignore patterns (LMOD_FILE_IGNORE_PATTERNS)              {"%.version[-._].*", "%.modulerc[-._].*"}
Use italic instead of dim (LMOD_HIDDEN_ITALIC)                no
KSH Support (LMOD_KSH_SUPPORT)                                no
Language used for err/msg/warn (LMOD_LANG)                    en
Site message file (LMOD_SITE_MSG_FILE)                        <empty>
LD_LIBRARY_PATH at config time (LMOD_LD_LIBRARY_PATH)         <empty>
LD_PRELOAD at config time (LMOD_LD_PRELOAD)                   <empty>
LuaFileSystem version                                         1.8.0
Lmod version                                                  8.7.53
Lmod branch (LMOD_BRANCH)                                     main
lmod_config.lua location (LMOD_CONFIG_LOCATION)               no
Lua Version                                                   Lua 5.4
LUA_CPATH                                                     /usr/lib64/lua/5.4/?.so;/usr/lib64/lua/5.4/loadall.so;
LUA_PATH                                                      /usr/share/lua/5.4/?.lua;/usr/share/lua/5.4/?/init.lua;/usr/lib64/lua/5.4/?.lua;/usr/lib64/lua/5.4/?/init.lua
System lua-term (LMOD_HAVE_LUA_TERM)                          no
Active lua-term                                               true
Modules Auto Handling (MODULES_AUTO_HANDLING)                 no
MODULERC (LMOD_MODULERC)                                      /opt/ohpc/admin/lmod/etc/rc -> <empty>
avail: Include modulepath dir (LMOD_MPATH_AVAIL)              no
MODULEPATH_INIT (LMOD_MODULEPATH_INIT)                        /opt/ohpc/admin/lmod/lmod/init/.modulespath -> <empty>
MODULEPATH_ROOT (MODULEPATH_ROOT)                             /opt/ohpc/admin/modulefiles
NAG File (LMOD_ADMIN_FILE)                                    /opt/ohpc/admin/lmod/etc/admin.list
number of cache dirs                                          0
OS Name                                                       Rocky Linux 9.4 (Blue Onyx)
Pager (LMOD_PAGER)                                            /usr/bin/more
Pager Options (LMOD_PAGER_OPTS)                               -XqMREF
Path to HashSum (LMOD_HASHSUM_PATH)                           /usr/bin/sha1sum
Path to Lua                                                   /usr/bin/lua
Pin Versions in restore (LMOD_PIN_VERSIONS)                   no
Pkg Class name                                                Pkg
Lmod prefix                                                   /opt/ohpc/admin
Site controlled prefix (SITE_CONTROLLED_PREFIX)               no
Prepend order (LMOD_PREPEND_BLOCK)                            normal
LMOD_RC (LMOD_RC)                                             <empty>
Redirect to stdout (LMOD_REDIRECT)                            yes
Supporting Full Settarg Use (LMOD_SETTARG_FULL_SUPPORT)       no
User shell                                                    bash
Site Name (LMOD_SITE_NAME)                                    <empty>
Site Pkg location                                             standard
Ignore Cache (LMOD_IGNORE_CACHE)                              no
Cached loads (LMOD_CACHED_LOADS)                              no
System Default Modules (LMOD_SYSTEM_DEFAULT_MODULES)          <empty>
System Name (LMOD_SYSTEM_NAME)                                <empty>
SYSHOST (cluster name) (LMOD_SYSHOST)                         <empty>
TCL Version                                                   8.6.10
Terse Decorations (LMOD_TERSE_DECORATIONS)                    yes
User cache valid time(sec) (LMOD_ANCIENT_TIME)                86400
Write cache after (sec) (LMOD_SHORT_TIME)                     2
Threshold (sec) (LMOD_THRESHOLD)                              1
Tmod find first rule (LMOD_TMOD_PATH_RULE)                    no
Tmod prepend PATH Rule (LMOD_TMOD_PATH_RULE)                  no
Tracing (LMOD_TRACING)                                        no
uname -a                                                      Linux io-login-02.io.internal 5.14.0-427.42.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 31 14:01:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
User Cache Directory                                          /root/.cache/lmod
Admin file                                                    /opt/ohpc/admin/lmod/etc/admin.list


Changes from Default Configuration
----------------------------------

Name                         Where Set  Default                                              Value
----                         ---------  -------                                              -----
LFS_VERSION                  D          1.6.3                                                1.8.0
LMOD_AUTO_SWAP               C          yes                                                  no
LMOD_COLORIZE                E          yes                                                  no
LMOD_PACKAGE_PATH            D          nil                                                  <empty>
LMOD_PAGER                   C          less                                                 /usr/bin/more
LMOD_REDIRECT                C          no                                                   yes
LMOD_SITEPACKAGE_LOCATION    Other      /opt/ohpc/admin/lmod/8.7.53/libexec/SitePackage.lua  <srctree>
LMOD_SYSTEM_DEFAULT_MODULES  D          __unknown__                                          <empty>
LMOD_TCLSH                   C          tclsh                                                /usr/bin/tclsh
MODULEPATH_ROOT              C                                                               /opt/ohpc/admin/modulefiles
PATH_TO_LUA                  C          lua                                                  /usr/bin/lua


Where Set -> D: default, E: environment, C: configuration
             lmod_cfg: lmod_config.lua SitePkg: SitePackage StdPkg: StandardPackage
             Other: Set somewhere outside of normal locations

Active RC file(s):
------------------
/opt/ohpc/admin/lmod/8.7.53/init/lmodrc.lua


    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Lmod Property Table (LMOD_RC):
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



propT = {
  arch = {
    displayT = {
      gpu = {
        color = "red",
        doc = "built for GPU",
        full_color = false,
        long = "(g)",
        short = "(g)",
      },
      ["gpu:mic"] = {
        color = "red",
        doc = "built natively for MIC and GPU",
        full_color = false,
        long = "(g,m)",
        short = "(gm)",
      },
      ["gpu:mic:offload"] = {
        color = "red",
        doc = "built natively for MIC and GPU and offload to the MIC",
        full_color = false,
        long = "(g,m,o)",
        short = "(@)",
      },
      mic = {
        color = "blue",
        doc = "built for host and native MIC",
        full_color = false,
        long = "(m)",
        short = "(m)",
      },
      ["mic:offload"] = {
        color = "blue",
        doc = "built for host, native MIC and offload to the MIC",
        full_color = false,
        long = "(m,o)",
        short = "(*)",
      },
      offload = {
        color = "blue",
        doc = "built for offload to the MIC only",
        full_color = false,
        long = "(o)",
        short = "(o)",
      },
    },
    validT = {
      gpu = 1,
      mic = 1,
      offload = 1,
    },
  },
  lmod = {
    displayT = {
      sticky = {
        color = "red",
        doc = "Module is Sticky, requires --force to unload or purge",
        long = "(S)",
        short = "(S)",
      },
    },
    validT = {
      sticky = 1,
    },
  },
  state = {
    displayT = {
      experimental = {
        color = "blue",
        doc = "Experimental",
        long = "(E)",
        short = "(E)",
      },
      obsolete = {
        color = "red",
        doc = "Obsolete",
        long = "(O)",
        short = "(O)",
      },
      testing = {
        color = "green",
        doc = "Testing",
        long = "(T)",
        short = "(T)",
      },
    },
    validT = {
      experimental = 1,
      obsolete = 1,
      testing = 1,
    },
  },
  status = {
    displayT = {
      active = {
        color = "yellow",
        doc = "Module is loaded",
        long = "(L)",
        short = "(L)",
      },
    },
    validT = {
      active = 1,
    },
  },
}

If I source the non lmod file without clearing the cache, it works, but module av is failing with:

/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/lua5.1: ...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: bad argument #1 to 'next' (table expected, got boolean)
stack traceback:
	[C]: in function 'next'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: in function 'l_readCacheFile'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:561: in function 'build'
	...mpat/linux/x86_64/usr/share/Lmod/libexec/ModuleA.lua:685: in function 'singleton'
	...6/compat/linux/x86_64/usr/share/Lmod/libexec/Hub.lua:1134: in function 'avail'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:145: in function 'cmd'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:514: in function 'main'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:585: in main chunk

Would it be a better way internally with EESSIE or outside of it to check the cache and clear it before loading it ? And does it mean we cannot mix many modules configurations ?


EDIT: EESSIE loading works by clearing the cache, but the modules themselves cannot be load afterwards...

module load PyTorch/2.1.2-foss-2023a

/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/lua5.1: ...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: bad argument #1 to 'next' (table expected, got boolean)
stack traceback:
	[C]: in function 'next'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: in function 'l_readCacheFile'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:561: in function 'build'
	...mpat/linux/x86_64/usr/share/Lmod/libexec/ModuleA.lua:685: in function 'singleton'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:183: in function 'l_lazyEval'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:262: in function 'sn'
	...6/compat/linux/x86_64/usr/share/Lmod/libexec/Hub.lua:312: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1055: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1031: in function 'load_usr'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:550: in function 'l_usrLoad'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:578: in function 'cmd'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:514: in function 'main'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:585: in main chunk
	[C]: ?

remyd1 avatar Apr 08 '25 13:04 remyd1

That cache issue is really weird. This time, I am able to load the module (tested with gnuplot and pytorch modules). I think I just have to be sure my environment is clean and cleared before using it...

remyd1 avatar Apr 08 '25 13:04 remyd1

@remyd1 Can you try running Lmod in debug mode when the crash happens, catch the output, and share the file (it'll be quite big). Something like:

module -DDD load example 2>&1 | tee lmod-debug.out

boegel avatar Apr 08 '25 15:04 boegel

Ho @boegel ,

Sorry for the delay, I was on vacations.

Here is the corresponding output for pytorch.
Description                                                   Value
-----------                                                   -----
Allow root to use Lmod (LMOD_ALLOW_ROOT_USE)                  yes
Allow TCL modulefiles (LMOD_ALLOW_TCL_FILES)                  yes
Auto swapping (LMOD_AUTO_SWAP)                                yes
Avail Style (LMOD_AVAIL_STYLE)                                <system>
Case Independent Sorting (LMOD_CASE_INDEPENDENT_SORTING)      yes
Colorize Lmod (LMOD_COLORIZE)                                 no
Configuration dir (LMOD_CONFIG_DIR)                           /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod
Disable Same Name AutoSwap (LMOD_DISABLE_SAME_NAME_AUTOSWAP)  no
Display Extension w/ avail (LMOD_AVAIL_EXTENSIONS)            yes
Use ~/.config dir only (LMOD_USE_DOT_CONFIG_ONLY)             no
Allow duplicate paths (LMOD_DUPLICATE_PATHS)                  no
Dynamic Spider Cache (LMOD_DYNAMIC_SPIDER_CACHE)              yes
Require Exact Match/no defaults (LMOD_EXACT_MATCH)            no
Export the module command (LMOD_EXPORT_MODULE)                yes
Allow extended default (LMOD_EXTENDED_DEFAULT)                yes
Use attached TCL over system call (LMOD_FAST_TCL_INTERP)      yes
Is fast TCL interp available (LMOD_USING_FAST_TCL_INTERP)     yes
Use italic instead of dim (LMOD_HIDDEN_ITALIC)                no
KSH Support (LMOD_KSH_SUPPORT)                                yes
Language used for err/msg/warn (LMOD_LANG)                    en
Site message file (LMOD_SITE_MSG_FILE)                        <empty>
LD_LIBRARY_PATH at config time (LMOD_LD_LIBRARY_PATH)         <empty>
LD_PRELOAD at config time (LMOD_LD_PRELOAD)                   <empty>
LuaFileSystem version                                         1.8.0
Lmod version                                                  8.7.23
lmod_config.lua location (LMOD_CONFIG_LOCATION)               no
Lua Version                                                   Lua 5.1
LUA_CPATH                                                     /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?.so;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/loadall.so
LUA_PATH                                                      /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/lua/5.1/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/lua/5.1/?/init.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?/init.lua
System lua-term (LMOD_HAVE_LUA_TERM)                          yes
Active lua-term                                               true
MODULERCFILE (LMOD_MODULERCFILE)                              /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/../etc/rc -> <empty>
avail: Include modulepath dir (LMOD_MPATH_AVAIL)              no
MODULEPATH_INIT (LMOD_MODULEPATH_INIT)                        /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/init/.modulespath -> <empty>
MODULEPATH_ROOT (MODULEPATH_ROOT)                             /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/etc/modulefiles
number of cache dirs                                          2
OS Name                                                       Gentoo Linux
Pager (LMOD_PAGER)                                            /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/less
Pager Options (LMOD_PAGER_OPTS)                               -XqMREF
Path to HashSum (LMOD_HASHSUM_PATH)                           /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/sha1sum
Path to Lua                                                   /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/lua5.1
Pin Versions in restore (LMOD_PIN_VERSIONS)                   no
Pkg Class name                                                Pkg
Lmod prefix                                                   /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod
Site controlled prefix (SITE_CONTROLLED_PREFIX)               yes
Prepend order (LMOD_PREPEND_BLOCK)                            normal
LMOD_RC (LMOD_RC)                                             <empty>
Redirect to stdout (LMOD_REDIRECT)                            no
Supporting Full Settarg Use (LMOD_SETTARG_FULL_SUPPORT)       no
User shell                                                    bash
Site Name (LMOD_SITE_NAME)                                    Gentoo
Site Pkg location                                             /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/SitePackage.lua
Ignore Cache (LMOD_IGNORE_CACHE)                              no
Cached loads (LMOD_CACHED_LOADS)                              yes
System Default Modules (LMOD_SYSTEM_DEFAULT_MODULES)          <empty>
System Name (LMOD_SYSTEM_NAME)                                <empty>
SYSHOST (cluster name) (LMOD_SYSHOST)                         Gentoo
TCL Version                                                   8.6.13
User cache valid time(sec) (LMOD_ANCIENT_TIME)                86400
Write cache after (sec) (LMOD_SHORT_TIME)                     2
Threshold (sec) (LMOD_THRESHOLD)                              1
Tmod find first rule (LMOD_TMOD_PATH_RULE)                    no
Tmod prepend PATH Rule (LMOD_TMOD_PATH_RULE)                  no
Tracing (LMOD_TRACING)                                        no
uname -a                                                      Linux io-login-01.io.internal 5.14.0-427.42.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 31 14:01:51 UTC 2024 x86_64 AMD EPYC-Genoa Processor AuthenticAMD GNU/Linux
User Cache Directory                                          /home/dernatr/.cache/lmod
Admin file                                                    /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/../etc/admin.list


Changes from Default Configuration
----------------------------------

Name                           Where Set  Default                                                                                               Value
----                           ---------  -------                                                                                               -----
LFS_VERSION                    D          1.6.3                                                                                                 1.8.0
LMOD_CACHED_LOADS              D          no                                                                                                    yes
LMOD_CASE_INDEPENDENT_SORTING  C          no                                                                                                    yes
LMOD_COLORIZE                  E          yes                                                                                                   no
LMOD_CONFIG_DIR                E          /etc/lmod                                                                                             /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod
LMOD_HAVE_LUA_TERM             C          no                                                                                                    yes
LMOD_KSH_SUPPORT               C          no                                                                                                    yes
LMOD_PACKAGE_PATH              D          nil                                                                                                   /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod
LMOD_PAGER                     C          less                                                                                                  /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/less
LMOD_SITEPACKAGE_LOCATION      Other      /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/SitePackage.lua  /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/SitePackage.lua
LMOD_SITE_NAME                 C          false                                                                                                 Gentoo
LMOD_SYSHOST                   C          false                                                                                                 Gentoo
LMOD_SYSTEM_DEFAULT_MODULES    D          __unknown__                                                                                           <empty>
LMOD_TCLSH                     C          tclsh                                                                                                 /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/tclsh
MODULEPATH_ROOT                C                                                                                                                /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/etc/modulefiles
PATH_TO_LUA                    C          lua                                                                                                   /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/lua5.1
SITE_CONTROLLED_PREFIX         C          no                                                                                                    yes


Where Set -> D: default, E: environment, C: configuration
             lmod_cfg: lmod_config.lua SitePkg: SitePackage StdPkg: StandardPackage
             Other: Set somewhere outside of normal locations

Active RC file(s):
------------------
/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/../init/lmodrc.lua
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/lmodrc.lua


Cache Directory                                                                            Time Stamp File
---------------                                                                            ---------------
/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/etc/lmod_cache/spider_cache  /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/etc/lmod_cache/system.txt
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/cache       /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/cache/timestamp


    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Lmod Property Table:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



propT = {
  arch = {
    displayT = {
      gpu = {
        color = "red",
        doc = "built for GPU",
        full_color = false,
        long = "(g)",
        short = "(g)",
      },
      ["gpu:mic"] = {
        color = "red",
        doc = "built natively for MIC and GPU",
        full_color = false,
        long = "(g,m)",
        short = "(gm)",
      },
      ["gpu:mic:offload"] = {
        color = "red",
        doc = "built natively for MIC and GPU and offload to the MIC",
        full_color = false,
        long = "(g,m,o)",
        short = "(@)",
      },
      mic = {
        color = "blue",
        doc = "built for host and native MIC",
        full_color = false,
        long = "(m)",
        short = "(m)",
      },
      ["mic:offload"] = {
        color = "blue",
        doc = "built for host, native MIC and offload to the MIC",
        full_color = false,
        long = "(m,o)",
        short = "(*)",
      },
      offload = {
        color = "blue",
        doc = "built for offload to the MIC only",
        full_color = false,
        long = "(o)",
        short = "(o)",
      },
    },
    validT = {
      gpu = 1,
      mic = 1,
      offload = 1,
    },
  },
  lmod = {
    displayT = {
      sticky = {
        color = "red",
        doc = "Module is Sticky, requires --force to unload or purge",
        long = "(S)",
        short = "(S)",
      },
    },
    validT = {
      sticky = 1,
    },
  },
  state = {
    displayT = {
      experimental = {
        color = "blue",
        doc = "Experimental",
        long = "(E)",
        short = "(E)",
      },
      obsolete = {
        color = "red",
        doc = "Obsolete",
        long = "(O)",
        short = "(O)",
      },
      testing = {
        color = "green",
        doc = "Testing",
        long = "(T)",
        short = "(T)",
      },
    },
    validT = {
      experimental = 1,
      obsolete = 1,
      testing = 1,
    },
  },
  status = {
    displayT = {
      active = {
        color = "yellow",
        doc = "Module is loaded",
        long = "(L)",
        short = "(L)",
      },
    },
    validT = {
      active = 1,
    },
  },
}


lmod(-DDD load PyTorch/2.1.2-foss-2023a){
  Date: Fri May  2 09:50:44 2025
  Hostname: io-login-01.io.internal
  System: Linux 5.14.0-427.42.1.el9_4.x86_64
  Version: #1 SMP PREEMPT_DYNAMIC Thu Oct 31 14:01:51 UTC 2024
  Lua Version: 5.1
  Lmod Version: 8.7.23  2023-03-29 17:19 -05:00
  package.path: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/?.lua;/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/?/init.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/../tools/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/../tools/?/init.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/../shells/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/?/init.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/lua/5.1/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/lua/5.1/?/init.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?.lua;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?/init.lua
  package.cpath: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/../lib/?.so;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/share/Lmod/libexec/../lib/?.so;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/?.so;/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/lua/5.1/loadall.so
  lmodPath: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod
  LOADEDMODULES: nil
  shellNm: bash, Shell:name(): bash
  Calling Hub:singleton(checkMPATH) w checkMPATH: true
  Hub:singleton(safe: true){
    s_hub: table: 0x55efe062a3c0, safe: true
  } Hub:singleton
  cmd name: load
  Load_Usr(PyTorch/2.1.2-foss-2023a){
    FrameStk:l_new(){
      MT:singleton(){
        getMT: Sz: 2
        getMT: nm:_ModuleTable001_, v: X01vZHVsZVRhYmxlXyA9IHsKTVR2ZXJzaW9uID0gMywKY19yZWJ1aWxkVGltZSA9IGZhbHNlLApjX3Nob3J0VGltZSA9IGZhbHNlLApkZXB0aFQgPSB7fSwKZmFtaWx5ID0ge30sCm1UID0ge30sCm1wYXRoQSA9IHsKIi9jdm1mcy9zb2Z0d2FyZS5lZXNzaS5pby9ob3N0X2luamVjdGlvbnMvMjAyMy4wNi9zb2Z0d2FyZS9saW51eC94ODZfNjQvYW1kL3plbjQvbW9kdWxlcy9hbGwiLCAiL2N2bWZzL3NvZnR3YXJlLmVlc3NpLmlvL3ZlcnNpb25zLzIwMjMuMDYvc29mdHdhcmUvbGludXgveDg2XzY0L2FtZC96ZW40L21vZHVsZXMvYWxsIiwgIi9ldGMvc2NsL21vZHVsZWZpbGVzIiwgIi9vcHQvb2hwYy9wdWIvbW9kdWxlZmlsZXMiCiwgIi90cmluaXR5L3NoYXJlZC9tb2R1bGVm
        getMT: nm:_ModuleTable002_, v: aWxlcy9tb2R1bGVncm91cHMiLCAiL3RyaW5pdHkvc2hhcmVkL21vZHVsZWZpbGVzL0NWLXN0YW5kYXJkIiwgIi90cmluaXR5L3NoYXJlZC9tb2R1bGVmaWxlcy9sb2NhbCIsCn0sCnN5c3RlbUJhc2VNUEFUSCA9ICIvb3B0L29ocGMvcHViL21vZHVsZWZpbGVzIiwKfQo=
        MT l_new(s,restoreFn:nil){
          currentMPATH: /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen4/modules/all:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all:/etc/scl/modulefiles:/opt/ohpc/pub/modulefiles:/trinity/shared/modulefiles/modulegroups:/trinity/shared/modulefiles/CV-standard:/trinity/shared/modulefiles/local
        } MT l_new
        s_mt = {
          MTversion = 3,
          c_rebuildTime = false,
          c_shortTime = false,
          depthT = {},
          family = {},
          mT = {},
          mpathA = {
            "/cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen4/modules/all", "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all", "/etc/scl/modulefiles"
            , "/opt/ohpc/pub/modulefiles", "/trinity/shared/modulefiles/modulegroups", "/trinity/shared/modulefiles/CV-standard", "/trinity/shared/modulefiles/local",
          },
          systemBaseMPATH = "/opt/ohpc/pub/modulefiles",
        }
      } MT:singleton
    } FrameStk:l_new
    l_usrLoad(argA, check_must_load: true){
      Setting mcp to MC_Load
      MainControl:load_usr(mA={PyTorch/2.1.2-foss-2023a}){
        l_registerUserLoads(mA){
          userName: PyTorch/2.1.2-foss-2023a
        } l_registerUserLoads
        MainControl:load(mA={PyTorch/2.1.2-foss-2023a}){
          Hub:singleton(safe: nil){
            s_hub: table: 0x55efe062a3c0, safe: true
          } Hub:singleton
          Hub:load(mA={PyTorch/2.1.2-foss-2023a}){
            Hub:load i: 1, userName: PyTorch/2.1.2-foss-2023a
            Cache:singleton(){
              Cache:l_new(){
                ReadLmodRC:singleton(){
                } ReadLmodRC:singleton
                #scDescriptT: 2
                Adding: dir: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/cache, timestamp: 1746164815
              } Cache:l_new
              s_cache.buildCache: nil
              spiderDirT[/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all]: false
              spiderDirT[/etc/scl/modulefiles]: false
              spiderDirT[/opt/ohpc/pub/modulefiles]: false
              spiderDirT[/trinity/shared/modulefiles/modulegroups]: false
              spiderDirT[/trinity/shared/modulefiles/local]: false
            } Cache:singleton
            Cache:build(fast=nil){
              self.buildCache: true
              buildFresh: false
              Cache l_readCacheFile(mpathA, spiderTFnA){
                #spiderTFnA: 1
                cacheFile found: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/.lmod/cache/spiderT.luac_5.1
                valid: true, timeDiff: 1
              } Cache l_readCacheFile
              Cache l_readCacheFile(mpathA, spiderTFnA){
                #spiderTFnA: 1
                Did not find:    /home/dernatr/.cache/lmod/spiderT.x86_64_Linux.luac_5001
                cacheFile found: /home/dernatr/.cache/lmod/spiderT.x86_64_Linux.lua
                valid: true, timeDiff: 80899.428221941
/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/lua5.1: ...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: bad argument #1 to 'next' (table expected, got boolean)
stack traceback:
	[C]: in function 'next'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:341: in function 'l_readCacheFile'
	...compat/linux/x86_64/usr/share/Lmod/libexec/Cache.lua:561: in function 'build'
	...mpat/linux/x86_64/usr/share/Lmod/libexec/ModuleA.lua:685: in function 'singleton'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:183: in function 'l_lazyEval'
	...compat/linux/x86_64/usr/share/Lmod/libexec/MName.lua:262: in function 'sn'
	...6/compat/linux/x86_64/usr/share/Lmod/libexec/Hub.lua:312: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1055: in function 'load'
	.../linux/x86_64/usr/share/Lmod/libexec/MainControl.lua:1031: in function 'load_usr'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:550: in function 'l_usrLoad'
	...pat/linux/x86_64/usr/share/Lmod/libexec/cmdfuncs.lua:578: in function 'cmd'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:514: in function 'main'
	...3.06/compat/linux/x86_64/usr/share/Lmod/libexec/lmod:585: in main chunk
	[C]: ?

remyd1 avatar May 02 '25 09:05 remyd1

BTW, I can ignore cache (and it works), with:

source /cvmfs/software.eessi.io/versions/2023.06/init/lmod/bash
LMOD_IGNORE_CACHE=yes module {spider,available,load,...}

remyd1 avatar May 02 '25 11:05 remyd1

Maybe another useful information is that it is not a vanilla OpenHPC distro but a StackHPC slurm appliance based upon RockyLinux with OpenHPC repositories.

remyd1 avatar May 02 '25 11:05 remyd1

@remyd1 We ran into a similar issue on our HPC-UGent systems, where the underlying cause was missing cache files for cascadelake and icelake, could that be related here?

See https://gitlab.com/eessi/support/-/issues/167 for more details.

boegel avatar Jun 10 '25 10:06 boegel