Failed to read 8 bytes from input stream at first SCF iteration
Describe the bug
when running ABACUS with OMP_NUM_THREADS=12 nohup mpirun -n 2 --map-by socket --bind-to none abacus | tee output.log & , the program crashed at first step of SCF iterration using HSE functional. I use the -DDEBUG_INFO=ON to provide more details for debug
Expected behavior
No response
To Reproduce
before using toolchain, i have modified the script install_openmpi.sh and install_elpa.sh to enable the support of cuda awared mpi and cusolvermp and disabled compilation of gpu version of elpa. configure of openmpi
./configure CFLAGS="${CFLAGS}" \
--prefix=${pkg_install_dir} \
--libdir="${pkg_install_dir}/lib" \
--with-zlib=${ZLIB} \
--with-libevent=internal \
--with-cuda=${CUDA_PATH} \
--with-ucx=${UCX} \
--with-ucc=${UCC} \
${EXTRA_CONFIGURE_FLAGS} \
> configure.log 2>&1 || tail -n ${LOG_LINES} configure.log
configure of elpa
for TARGET in "cpu" ; do
[ "$TARGET" = "nvidia" ] && [ "$ENABLE_CUDA" != "__TRUE__" ] && continue
# disable cpu if cuda is enabled
# [ "$TARGET" != "nvidia" ] && [ "$ENABLE_CUDA" = "__TRUE__" ] && continue
echo "Installing from scratch into ${pkg_install_dir}/${TARGET}"
mkdir -p "build_${TARGET}"
cd "build_${TARGET}"
if [ "${with_amd}" != "__DONTUSE__" ] && [ "${WITH_FLANG}" = "yes" ] ; then
echo "AMD fortran compiler detected, enable special option operation"
the toolchain_gnu.sh
./install_abacus_toolchain.sh \
--with-gcc=install \
--with-intel=no \
--with-openblas=install \
--with-openmpi=install \
--with-cmake=install \
--with-scalapack=install \
--with-libxc=install \
--with-fftw=install \
--with-elpa=install \
--with-cereal=install \
--with-rapidjson=install \
--with-libtorch=install \
--with-libnpy=install \
--with-libri=install \
--with-libcomm=install \
--with-4th-openmpi=no \
--enable-cuda \
--gpu-ver=86 \
| tee compile.log
the build_abacus_gnu.sh
cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
-DCMAKE_CXX_COMPILER=g++ \
-DMPI_CXX_COMPILER=mpicxx \
-DLAPACK_DIR=$LAPACK \
-DSCALAPACK_DIR=$SCALAPACK \
-DUSE_ELPA=ON \
-DELPA_DIR=$ELPA \
-DCEREAL_INCLUDE_DIR=$CEREAL \
-DFFTW3_DIR=$FFTW3 \
-DLibxc_DIR=$LIBXC \
-DENABLE_LCAO=ON \
-DENABLE_LIBXC=ON \
-DUSE_OPENMP=ON \
-DENABLE_RAPIDJSON=ON \
-DRapidJSON_DIR=$RAPIDJSON \
-DUSE_CUDA=ON \
-DUSE_CUDA_MPI=ON \
-DENABLE_DEEPKS=ON \
-DTorch_DIR=$LIBTORCH \
-Dlibnpy_INCLUDE_DIR=$LIBNPY \
-DENABLE_LIBRI=ON \
-DLIBRI_DIR=$LIBRI \
-DLIBCOMM_DIR=$LIBCOMM \
-DENABLE_CUSOLVERMP=ON \
-DCAL_CUSOLVERMP_PATH=$CUDA_PATH/lib64 \
-DDEBUG_INFO=ON
Environment
No response
Additional Context
Task list for Issue attackers (only for developers)
- [ ] Verify the issue is not a duplicate.
- [ ] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).
Thank you for proposing the issue. We will have someone to look at the issue.
It's restart_load=True in INPUT, but no relevant restart files are provided here.
You can try as restart_load=False.
@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.
@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.
We will try to implement some warnings, thanks for your feedback.
@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.
With #6194, if there is no restart information, an explicit warning will be output. The density will be initilized automatically and ABACUS will run as usual.