Valgrind reports some still reachable bytes in MPI_File_open
Reproduced on Ubuntu 18 with default GCC 7.4.0
[Build MPICH 4.0.2 with -g flag
wget https://www.mpich.org/static/downloads/4.0.2/mpich-4.0.2.tar.gz
tar zxf mpich-4.0.2.tar.gz
cd mpich-4.0.2
CFLAGS="-g" ./configure --prefix=/path/to/mpich/installation
make -j4
make install
[Run a simple test with Valgrind]
export PATH=/path/to/mpich/installation/bin:$PATH
mpicc -g test_mpiio.c
mpiexec -n 2 valgrind --leak-check=full --show-leak-kinds=all ./a.out
[Sample output]
...
==24456== 32,768 bytes in 1 blocks are still reachable in loss record 7 of 8
==24456== at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24456== by 0x52990E3: MPIR_Handle_indirect_init (mpir_handlemem.h:188)
==24456== by 0x52990E3: MPIR_Handle_obj_alloc_unsafe (mpir_handlemem.h:318)
==24456== by 0x52990E3: MPIR_Info_handle_obj_alloc (mpir_handlemem.c:191)
==24456== by 0x521849F: MPIR_Info_alloc (infoutil.c:57)
==24456== by 0x521824F: MPIR_Info_set_impl (info_impl.c:206)
==24456== by 0x4F4FD55: internal_Info_set (info_set.c:69)
==24456== by 0x4F4FD55: PMPI_Info_set (info_set.c:142)
==24456== by 0x728BAB8: ADIOI_GEN_SetInfo (ad_hints.c:81)
==24456== by 0x7296FE0: ADIO_Open (ad_open.c:123)
==24456== by 0x7271FCB: PMPI_File_open (open.c:143)
==24456== by 0x1088DC: main (test_mpiio.c:8)
==24456==
==24456== 65,536 bytes in 1 blocks are still reachable in loss record 8 of 8
==24456== at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24456== by 0x529918E: MPIR_Handle_indirect_init (mpir_handlemem.h:169)
==24456== by 0x529918E: MPIR_Handle_obj_alloc_unsafe (mpir_handlemem.h:318)
==24456== by 0x529918E: MPIR_Info_handle_obj_alloc (mpir_handlemem.c:191)
==24456== by 0x521849F: MPIR_Info_alloc (infoutil.c:57)
==24456== by 0x521824F: MPIR_Info_set_impl (info_impl.c:206)
==24456== by 0x4F4FD55: internal_Info_set (info_set.c:69)
==24456== by 0x4F4FD55: PMPI_Info_set (info_set.c:142)
==24456== by 0x728BAB8: ADIOI_GEN_SetInfo (ad_hints.c:81)
==24456== by 0x7296FE0: ADIO_Open (ad_open.c:123)
==24456== by 0x7271FCB: PMPI_File_open (open.c:143)
==24456== by 0x1088DC: main (test_mpiio.c:8)
==24456==
==24456== LEAK SUMMARY:
==24456== definitely lost: 0 bytes in 0 blocks
==24456== indirectly lost: 0 bytes in 0 blocks
==24456== possibly lost: 0 bytes in 0 blocks
==24456== still reachable: 98,414 bytes in 8 blocks
==24456== suppressed: 0 bytes in 0 blocks
Test program (test_mpiio.c)
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "test_file", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
@hzhou Is this a known issue?
@hzhou Is this a known issue?
Not really. Could you try the main branch of mpich? https://github.com/pmodels/mpich/blob/main/doc/wiki/source_code/Github.md
@hzhou Is this a known issue?
Not really. Could you try the main branch of mpich? https://github.com/pmodels/mpich/blob/main/doc/wiki/source_code/Github.md
It seems that the leaks from MPI_File_open have been fixed by latest main branch. Below are possible remaining leaks reported by Valgrind.
==24466== 2 bytes in 1 blocks are still reachable in loss record 1 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x5216083: MPIR_Info_push (infoutil.c:90)
==24466== by 0x5215B5E: MPIR_Info_set_impl (info_impl.c:153)
==24466== by 0x5324E17: setup_single_nic (ofi_nic.c:164)
==24466== by 0x53253C8: MPIDI_OFI_init_multi_nic (ofi_nic.c:128)
==24466== by 0x5303947: MPIDI_OFI_init_local (ofi_init.c:568)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 2 bytes in 1 blocks are still reachable in loss record 2 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x5216083: MPIR_Info_push (infoutil.c:90)
==24466== by 0x5215B5E: MPIR_Info_set_impl (info_impl.c:153)
==24466== by 0x5324E50: setup_single_nic (ofi_nic.c:166)
==24466== by 0x53253C8: MPIDI_OFI_init_multi_nic (ofi_nic.c:128)
==24466== by 0x5303947: MPIDI_OFI_init_local (ofi_init.c:568)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 5 bytes in 1 blocks are indirectly lost in loss record 3 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x5362506: hwloc__add_info (topology.c:464)
==24466== by 0x53901CF: hwloc__xml_import_cpukind (topology-xml.c:1815)
==24466== by 0x5390EC3: hwloc_look_xml (topology-xml.c:2123)
==24466== by 0x53691D3: hwloc_discover (topology.c:3356)
==24466== by 0x536A8E2: hwloc_topology_load (topology.c:4033)
==24466== by 0x52A9F59: MPII_hwtopo_init (mpir_hwtopo.c:216)
==24466== by 0x5216C9B: MPII_Init_thread (mpir_init.c:169)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 9 bytes in 1 blocks are still reachable in loss record 4 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x5216077: MPIR_Info_push (infoutil.c:89)
==24466== by 0x5215B5E: MPIR_Info_set_impl (info_impl.c:153)
==24466== by 0x5324E17: setup_single_nic (ofi_nic.c:164)
==24466== by 0x53253C8: MPIDI_OFI_init_multi_nic (ofi_nic.c:128)
==24466== by 0x5303947: MPIDI_OFI_init_local (ofi_init.c:568)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 15 bytes in 1 blocks are still reachable in loss record 5 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x5216077: MPIR_Info_push (infoutil.c:89)
==24466== by 0x5215B5E: MPIR_Info_set_impl (info_impl.c:153)
==24466== by 0x5324E50: setup_single_nic (ofi_nic.c:166)
==24466== by 0x53253C8: MPIDI_OFI_init_multi_nic (ofi_nic.c:128)
==24466== by 0x5303947: MPIDI_OFI_init_local (ofi_init.c:568)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 16 bytes in 1 blocks are indirectly lost in loss record 6 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x7B339B9: strdup (strdup.c:42)
==24466== by 0x53624CC: hwloc__add_info (topology.c:461)
==24466== by 0x53901CF: hwloc__xml_import_cpukind (topology-xml.c:1815)
==24466== by 0x5390EC3: hwloc_look_xml (topology-xml.c:2123)
==24466== by 0x53691D3: hwloc_discover (topology.c:3356)
==24466== by 0x536A8E2: hwloc_topology_load (topology.c:4033)
==24466== by 0x52A9F59: MPII_hwtopo_init (mpir_hwtopo.c:216)
==24466== by 0x5216C9B: MPII_Init_thread (mpir_init.c:169)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 32 bytes in 1 blocks are still reachable in loss record 7 of 10
==24466== at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x864D7E4: _dlerror_run (dlerror.c:140)
==24466== by 0x864D050: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==24466== by 0x71F92F9: ofi_load_dl_prov (fabric.c:692)
==24466== by 0x71F92F9: fi_ini (fabric.c:841)
==24466== by 0x71FA1CA: fi_getinfo (fabric.c:1094)
==24466== by 0x53259F2: find_provider (init_provider.c:115)
==24466== by 0x53259F2: MPIDI_OFI_find_provider (init_provider.c:71)
==24466== by 0x5303935: MPIDI_OFI_init_local (ofi_init.c:564)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 61 bytes in 1 blocks are still reachable in loss record 8 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x4017880: _dl_exception_create (dl-exception.c:77)
==24466== by 0x7BFD250: _dl_signal_error (dl-error-skeleton.c:117)
==24466== by 0x4009812: _dl_map_object (dl-load.c:2384)
==24466== by 0x4014EE3: dl_open_worker (dl-open.c:235)
==24466== by 0x7BFD2DE: _dl_catch_exception (dl-error-skeleton.c:196)
==24466== by 0x40147C9: _dl_open (dl-open.c:605)
==24466== by 0x864CF95: dlopen_doit (dlopen.c:66)
==24466== by 0x7BFD2DE: _dl_catch_exception (dl-error-skeleton.c:196)
==24466== by 0x7BFD36E: _dl_catch_error (dl-error-skeleton.c:215)
==24466== by 0x864D734: _dlerror_run (dlerror.c:162)
==24466== by 0x864D050: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==24466==
==24466== 149 (128 direct, 21 indirect) bytes in 1 blocks are definitely lost in loss record 9 of 10
==24466== at 0x4C2FA3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x4C31D84: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x536248C: hwloc__add_info (topology.c:455)
==24466== by 0x53901CF: hwloc__xml_import_cpukind (topology-xml.c:1815)
==24466== by 0x5390EC3: hwloc_look_xml (topology-xml.c:2123)
==24466== by 0x53691D3: hwloc_discover (topology.c:3356)
==24466== by 0x536A8E2: hwloc_topology_load (topology.c:4033)
==24466== by 0x52A9F59: MPII_hwtopo_init (mpir_hwtopo.c:216)
==24466== by 0x5216C9B: MPII_Init_thread (mpir_init.c:169)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== 160 bytes in 1 blocks are still reachable in loss record 10 of 10
==24466== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24466== by 0x52160A9: MPIR_Info_push (infoutil.c:78)
==24466== by 0x5215B5E: MPIR_Info_set_impl (info_impl.c:153)
==24466== by 0x5324E17: setup_single_nic (ofi_nic.c:164)
==24466== by 0x53253C8: MPIDI_OFI_init_multi_nic (ofi_nic.c:128)
==24466== by 0x5303947: MPIDI_OFI_init_local (ofi_init.c:568)
==24466== by 0x52AD058: MPID_Init (ch4_init.c:508)
==24466== by 0x521709B: MPII_Init_thread (mpir_init.c:230)
==24466== by 0x5217864: MPIR_Init_impl (mpir_init.c:102)
==24466== by 0x4F66814: internal_Init (init.c:53)
==24466== by 0x4F66814: PMPI_Init (init.c:105)
==24466== by 0x1088BA: main (test_mpiio.c:5)
==24466==
==24466== LEAK SUMMARY:
==24466== definitely lost: 128 bytes in 1 blocks
==24466== indirectly lost: 21 bytes in 2 blocks
==24466== possibly lost: 0 bytes in 0 blocks
==24466== still reachable: 281 bytes in 7 blocks
The hwloc leak is tracked here - https://github.com/open-mpi/hwloc/pull/547
I don't see the libfabric _dl_open leak -- maybe because I am building embedded libfabric -- but I do see some leaks from prov/opx. Tracked here - https://github.com/ofiwg/libfabric/issues/8091