charm icon indicating copy to clipboard operation
charm copied to clipboard

Memory leaks in converse's cldb

Open stwhite91 opened this issue 9 years ago • 3 comments

Original issue: https://charm.cs.illinois.edu/redmine/issues/1202


Running valgrind on netlrts-linux-x86_64 examples reveals two memory leaks in initialization routines. One is in src/conv-ldb/cldb.c line 360, the other is in src/conv-core/cputopology.C line 183.

==13005== 32 bytes in 1 blocks are possibly lost in loss record 60 of 215
==13005==    at 0x4C27AAA: malloc (vg_replace_malloc.c:291)
==13005==    by 0x53FC64: CmiAlloc (convcore.c:3035)
==13005==    by 0x545102: CldModuleGeneralInit (cldb.c:360)
==13005==    by 0x5413A0: ConverseCommonInit (convcore.c:3791)
==13005==    by 0x53D822: ConverseInit (machine-common-core.c:1261)
==13005==    by 0x4914C6: main (main.C:18)
==13005== 
==13005== 32 bytes in 1 blocks are possibly lost in loss record 61 of 215
==13005==    at 0x4C28222: operator new[](unsigned long) (vg_replace_malloc.c:384)
==13005==    by 0x54FB7F: cpuTopoRecvHandler(void*) (cputopology.C:183)
==13005==    by 0x53F3BC: CsdSchedulePoll (convcore.c:1783)
==13005==    by 0x54E3E4: LrtsInitCpuTopo (cputopology.C:582)
==13005==    by 0x4945E2: _initCharm(int, char**) (init.C:1393)
==13005==    by 0x53D92D: ConverseInit (machine-common-core.c:1294)
==13005==    by 0x4914C6: main (main.C:18)

stwhite91 avatar Sep 08 '16 13:09 stwhite91

Original date: 2017-02-06 19:02:11


Fix for cputopology mem leak: ~~https://charm.cs.illinois.edu/gerrit/#/c/2202/~~ https://github.com/UIUC-PPL/charm/commit/3a01bf56a793fd10a9bed631dfae7ad24b731c50

The mem leak in cldb is just something allocated at init that should be explicitly deleted at exit. Doing this would require creating an explicit clean or finalize function that should be called by Converse at exit.

juanjgalvez avatar Apr 25 '19 00:04 juanjgalvez

Original date: 2017-06-27 02:48:52


I see memory leaks from the new topology code merge in the last couple weeks. I'm not sure if they are really new, or if they just have new names...

stwhite91 avatar Apr 25 '19 00:04 stwhite91

Original date: 2017-06-28 21:03:49


Here's the new output:

==25396== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==25396==    at 0x5756183: __sendto_nocancel (syscall-template.S:81)
==25396==    by 0x608A2A: TransmitImplicitDgram1 (machine-eth.c:200)
==25396==    by 0x608DA0: TransmitDatagram (machine-eth.c:285)
==25396==    by 0x609CC8: CommunicationServerNet (machine-eth.c:734)
==25396==    by 0x60A110: LrtsAdvanceCommunication (machine.c:1707)
==25396==    by 0x6055A2: AdvanceCommunication (machine-common-core.c:1317)
==25396==    by 0x605820: CmiGetNonLocal (machine-common-core.c:1487)
==25396==    by 0x60C4E7: CsdNextMessage (convcore.c:1781)
==25396==    by 0x60C835: CsdSchedulePoll (convcore.c:1972)
==25396==    by 0x622743: LrtsInitCpuTopo (cputopology.C:593)
==25396==    by 0x62291C: CmiInitCPUTopology (cputopology.C:679)
==25396==    by 0x52AE4B: _initCharm(int, char**) (init.C:1364)
==25396==  Address 0x5ae40c5 is 21 bytes inside a block of size 76 alloc'd
==25396==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25396==    by 0x601DD1: malloc_nomigrate (libmemory-default.c:724)
==25396==    by 0x60E96A: CmiAlloc (convcore.c:2939)
==25396==    by 0x622694: LrtsInitCpuTopo (cputopology.C:580)
==25396==    by 0x62291C: CmiInitCPUTopology (cputopology.C:679)
==25396==    by 0x52AE4B: _initCharm(int, char**) (init.C:1364)
==25396==    by 0x60555C: ConverseRunPE (machine-common-core.c:1296)
==25396==    by 0x60547A: ConverseInit (machine-common-core.c:1198)
==25396==    by 0x528B47: main (main.C:18)

==25395== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==25395==    at 0x5756183: __sendto_nocancel (syscall-template.S:81)
==25395==    by 0x6088F6: TransmitImplicitDgram (machine-eth.c:174)
==25395==    by 0x608C7F: TransmitDatagram (machine-eth.c:265)
==25395==    by 0x609CC8: CommunicationServerNet (machine-eth.c:734)
==25395==    by 0x60A110: LrtsAdvanceCommunication (machine.c:1707)
==25395==    by 0x6055A2: AdvanceCommunication (machine-common-core.c:1317)
==25395==    by 0x605820: CmiGetNonLocal (machine-common-core.c:1487)
==25395==    by 0x60C4E7: CsdNextMessage (convcore.c:1781)
==25395==    by 0x60C835: CsdSchedulePoll (convcore.c:1972)
==25395==    by 0x622743: LrtsInitCpuTopo (cputopology.C:593)
==25395==    by 0x62291C: CmiInitCPUTopology (cputopology.C:679)
==25395==    by 0x52AE4B: _initCharm(int, char**) (init.C:1364)
==25395==  Address 0x5af8d05 is 21 bytes inside a block of size 64 alloc'd
==25395==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25395==    by 0x601DD1: malloc_nomigrate (libmemory-default.c:724)
==25395==    by 0x60E96A: CmiAlloc (convcore.c:2939)
==25395==    by 0x60593F: CopyMsg (machine-common-core.c:1579)
==25395==    by 0x603A94: SendSpanningChildren (machine-broadcast.c:117)
==25395==    by 0x603B12: SendSpanningChildrenProc (machine-broadcast.c:176)
==25395==    by 0x603BAD: CmiSyncBroadcastFn1 (machine-broadcast.c:219)
==25395==    by 0x603C87: CmiFreeBroadcastAllFn (machine-broadcast.c:290)
==25395==    by 0x621EAF: cpuTopoHandler(void*) (cputopology.C:288)
==25395==    by 0x60D6F9: CmiSendReduce (convcore.c:2446)
==25395==    by 0x60E196: CmiHandleReductionMessage (convcore.c:2627)
==25395==    by 0x60C3D9: CmiHandleMessage (convcore.c:1672)

stwhite91 avatar Apr 25 '19 00:04 stwhite91