nedmalloc icon indicating copy to clipboard operation
nedmalloc copied to clipboard

Critical - crash in multithreaded environment, when using nedrealloc (yes, again)

Open Gerilgfx opened this issue 12 years ago • 40 comments

Critical - crash in multithreaded environment, when using nedrealloc (yes, again)

crash appears when nedrealloc being called on multiple threads, reallocating small (or null) memory area to larger buffers again and again. The crash occurs mostly before reaching the first percent in the test. If the algo able to reach that point, software mostly survives. To reproduce the crash, its good to have other processes working too, for example, watching hd yourube video in the front.

Crash type: memory corruption

Version affected: newest (older versions not yet tested)

compiler flag:

g++ nedmalloctester3.c -o nedmalloctester -O3 -s -lpthread -m64

compiler version:

g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.7/lto-wrapper Target: x86_64-suse-linux Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.7 --enable-ssp --disable-libssp --disable-libitm --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --program-suffix=-4.7 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux Thread model: posix gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773](SUSE Linux)

system version:

uname -r -a Linux a1 3.4.6-2.10-desktop #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux

output:

g++ nedmalloctester3.c -o nedmalloctester -O3 -s -lpthread

./nedmalloctester

test 3 begins... nedmalloc: nedprealloc() called with a block not created by nedmalloc! Aborted

./nedmalloctester

test 3 begins... 0 percent finished ^C

./nedmalloctester

test 3 begins... 0 percent finished ^C

./nedmalloctester

test 3 begins... 0 percent finished ^C

./nedmalloctester

test 3 begins... nedmalloc: nedprealloc() called with a block not created by nedmalloc! Aborted

testcase:

// g++ nedmalloctester3.c -o nedmalloctester3 -O3 -s -pthread

include <stdio.h>

include <stdlib.h>

include <string.h>

include <pthread.h>

define USE_LOCKS 1

define USE_DL_PREFIX 1

define NDEBUG

define NO_NED_NAMESPACE

include "nedmalloc/nedmalloc_2013_apr/ori/nedmalloc.h"

include "nedmalloc/nedmalloc_2013_apr/ori/nedmalloc.c"

define malloc_vpool nedmalloc

define free_vpool nedfree

define realloc_vpool nedrealloc

/*#define malloc_vpool malloc

define free_vpool free

define realloc_vpool realloc*/

define TESTMEMMAX 1024_1024_2

void ** test=NULL;

int div_w=8; // block size to be sure that we touching pointers allocated from different thread ID-s

void malt(int thread){ for(int iteracio=1;iteracio<80;iteracio+=4){ for(int i=0;i<TESTMEMMAX;i++){ if(((i/div_w)%10)!=thread) continue; // 10 thread // printf("%d\n", i); test[i]=realloc_vpool(test[i], iteracio); memset(test[i], 1, iteracio); } } }

void *malt2(void * threadid){malt(1);} void *malt3(void * threadid){malt(2);} void *malt4(void * threadid){malt(3);} void *malt5(void * threadid){malt(4);} void *malt6(void * threadid){malt(5);} void *malt7(void * threadid){malt(6);} void *malt8(void * threadid){malt(7);} void *malt9(void * threadid){malt(8);} void *malt10(void * threadid){malt(9);}

void MallocStabTest3(){ printf("test 3 begins...\n");

test=(void**)malloc_vpool(128+(TESTMEMMAX*sizeof(void*)));
for(int i=0;i<(TESTMEMMAX);i++) test[i]=NULL;

for(int Z=0;Z<100;Z++){
    div_w=2+(rand()%40);  // random block size to be sure that we touching pointers allocated from different thread ID-s

    pthread_t TMP2=0;
    pthread_t TMP3=0;
    pthread_t TMP4=0;
    pthread_t TMP5=0;
    pthread_t TMP6=0;
    pthread_t TMP7=0;
    pthread_t TMP8=0;
    pthread_t TMP9=0;
    pthread_t TMP10=0;

    pthread_create(&TMP2, NULL, malt2, NULL);
    pthread_create(&TMP3, NULL, malt3, NULL);
    pthread_create(&TMP4, NULL, malt4, NULL);
    pthread_create(&TMP5, NULL, malt5, NULL);
    pthread_create(&TMP6, NULL, malt6, NULL);
    pthread_create(&TMP7, NULL, malt7, NULL);
    pthread_create(&TMP8, NULL, malt8, NULL);
    pthread_create(&TMP9, NULL, malt9, NULL);
    pthread_create(&TMP10, NULL, malt10, NULL);

    malt(0);

    pthread_join(TMP2, NULL);
    pthread_join(TMP3, NULL);
    pthread_join(TMP4, NULL);
    pthread_join(TMP5, NULL);
    pthread_join(TMP6, NULL);
    pthread_join(TMP7, NULL);
    pthread_join(TMP8, NULL);
    pthread_join(TMP9, NULL);
    pthread_join(TMP10, NULL);

    printf("%d percent finished\n", Z);
}

for(int i=0;i<(TESTMEMMAX);i++) if(test[i]) free_vpool(test[i]);
free_vpool(test);
printf("success.\n");

}

int main(){ MallocStabTest3(); }

Gerilgfx avatar Aug 22 '13 16:08 Gerilgfx

previous versions from 2013 crashing too

Gerilgfx avatar Aug 22 '13 17:08 Gerilgfx

The last time I put together some form of release was for v1.10 beta 3 in 2012. I agree that nedmalloc definitely needs a regularly executed stress test suite, and in fact I have recently purchased a server for a Jenkins CI which you can see at https://ci.nedprod.com/.

As it happens, I was fired from BlackBerry on Monday, so I suddenly have some free time. I'll look into figuring out some form of automated solution to the many breakages which have slipped into nedmalloc over the years by accident.

Thanks for reporting the bug Geri. You're a trooper.

Niall

ned14 avatar Aug 22 '13 18:08 ned14

thankyou for creating and supporting this wonderfull software. i suggest to create a stresstest based on my multithreaded testcases, like the current one, and those i posted before. they are simply enough, and its easy to debug them.

Gerilgfx avatar Aug 22 '13 18:08 Gerilgfx

hi. was you able to repeat the crash?

Gerilgfx avatar Aug 25 '13 14:08 Gerilgfx

It'll be a few days yet. I'm currently mentoring gsoc and I need to finish two work items to enable the student to proceed as he is waiting on me.

ned14 avatar Aug 25 '13 15:08 ned14

i did some test: -both -m64 and -m32 crashing -both with -o3 and without o3, crashing, both -s or/without s crashing so i guess its an algorithmic bug

Gerilgfx avatar Sep 04 '13 18:09 Gerilgfx

It is on my radar. Integrating the new items from http://boostafio.uservoice.com/forums/218980-boost-afio-feature-request before GSoC ends in ten days has proved harder than expected.

ned14 avatar Sep 06 '13 15:09 ned14

i wrapped nedrealloc to nedfree and nedmalloc functions in my code until the fix done, no need to hurry

Gerilgfx avatar Sep 07 '13 18:09 Gerilgfx

I should be able to look into this now. Can you put the test case above, which is too mangled to make much sense, into a gist so I can get it demangled? Thanks.

ned14 avatar Oct 10 '13 19:10 ned14

https://gist.github.com/Gerilgfx/6953861 i hope it was this one.

Gerilgfx avatar Oct 12 '13 19:10 Gerilgfx

Bad news: I can't replicate this on my Ubuntu 12.04 x64 machine with a i7-3770K CPU. I tried:

GCC v4.6.4 GCC v4.7.3 GCC v4.8.1

It could be a timing issue where your CPU finds a race mine doesn't. Or it could be a bug in GCC 4.7.1 which has since been fixed. There is some order sensitive code in the threadcache, a slight reordering from what is specified in the code would introduce exactly this kind of race. In theory a compiler shouldn't do such a reorder, but maybe there was a bug in GCC v4.7.1.

Niall

ned14 avatar Oct 17 '13 04:10 ned14

Also, try setting THREADCACHEMAX to 0. That will help me determine if it's dlmalloc or the thread cache which is at fault.

ned14 avatar Oct 17 '13 04:10 ned14

with: #define THREADCACHEMAX 0

test 3 begins... nedmalloc: nedprealloc() called with a block not created by nedmalloc! Aborted

Gerilgfx avatar Oct 17 '13 17:10 Gerilgfx

changing:

test[i]=realloc_vpool(test[i], iteracio);

to:

if(test[i]) free_vpool(test[i]); test[i]=malloc_vpool(iteracio);

works.

i think the bug is precisely in your realloc implementation.

Gerilgfx avatar Oct 17 '13 17:10 Gerilgfx

        if(test[i]) free_vpool(test[i]);
        test[i]=NULL;
        test[i]=realloc_vpool(test[i], iteracio);

this works too.

Gerilgfx avatar Oct 17 '13 17:10 Gerilgfx

if(!memsize)
{
    fprintf(stderr, "nedmalloc: nedprealloc() called with a block not created by nedmalloc!\n");
    abort();
}

changed to:

if(!memsize)
{
    fprintf(stderr, "nedmalloc: nedprealloc() called with a block not created by nedmalloc!\n");
    // abort();
}

result:

test 3 begins... nedmalloc: nedprealloc() called with a block not created by nedmalloc! *** glibc detected *** ./nedmalloctester3: munmap_chunk(): invalid pointer: 0x00007f5a29cf9010 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x78b56)[0x7f5a3d97bb56] ./nedmalloctester3[0x408528] ./nedmalloctester3[0x408796] /lib64/libpthread.so.0(+0x7e0e)[0x7f5a3dcafe0e] /lib64/libc.so.6(clone+0x6d)[0x7f5a3d9e72bd] ======= Memory map: ======== 00400000-0040c000 r-xp 00000000 08:21 125785 nedmalloctester3 0060b000-0060c000 r--p 0000b000 08:21 125785 nedmalloctester3 0060c000-0060d000 rw-p 0000c000 08:21 125785 nedmalloctester3 00a7e000-00a9f000 rw-p 00000000 00:00 0 [heap] 7f5a24000000-7f5a24021000 rw-p 00000000 00:00 0 7f5a24021000-7f5a28000000 ---p 00000000 00:00 0 7f5a29af9000-7f5a37bf9000 rw-p 00000000 00:00 0 7f5a37bf9000-7f5a37bfa000 ---p 00000000 00:00 0 7f5a37bfa000-7f5a383fa000 rw-p 00000000 00:00 0 7f5a383fa000-7f5a383fb000 ---p 00000000 00:00 0 7f5a383fb000-7f5a38ffb000 rw-p 00000000 00:00 0 7f5a38ffb000-7f5a38ffc000 ---p 00000000 00:00 0 7f5a38ffc000-7f5a398fc000 rw-p 00000000 00:00 0 7f5a398fc000-7f5a398fd000 ---p 00000000 00:00 0 7f5a398fd000-7f5a3a0fd000 rw-p 00000000 00:00 0 7f5a3a0fd000-7f5a3a0fe000 ---p 00000000 00:00 0 7f5a3a0fe000-7f5a3a8fe000 rw-p 00000000 00:00 0 7f5a3a8fe000-7f5a3a8ff000 ---p 00000000 00:00 0 7f5a3a8ff000-7f5a3b0ff000 rw-p 00000000 00:00 0 7f5a3b0ff000-7f5a3b100000 ---p 00000000 00:00 0 7f5a3b100000-7f5a3b900000 rw-p 00000000 00:00 0 [stack:9567] 7f5a3b900000-7f5a3b901000 ---p 00000000 00:00 0 7f5a3b901000-7f5a3c101000 rw-p 00000000 00:00 0 7f5a3c101000-7f5a3c102000 ---p 00000000 00:00 0 7f5a3c102000-7f5a3c902000 rw-p 00000000 00:00 0 7f5a3c903000-7f5a3d903000 rw-p 00000000 00:00 0 7f5a3d903000-7f5a3da9e000 r-xp 00000000 08:06 130903 /lib64/libc-2.15.so 7f5a3da9e000-7f5a3dc9e000 ---p 0019b000 08:06 130903 /lib64/libc-2.15.so 7f5a3dc9e000-7f5a3dca2000 r--p 0019b000 08:06 130903 /lib64/libc-2.15.so 7f5a3dca2000-7f5a3dca4000 rw-p 0019f000 08:06 130903 /lib64/libc-2.15.so 7f5a3dca4000-7f5a3dca8000 rw-p 00000000 00:00 0 7f5a3dca8000-7f5a3dcbf000 r-xp 00000000 08:06 130835 /lib64/libpthread-2.15.so 7f5a3dcbf000-7f5a3debe000 ---p 00017000 08:06 130835 /lib64/libpthread-2.15.so 7f5a3debe000-7f5a3debf000 r--p 00016000 08:06 130835 /lib64/libpthread-2.15.so 7f5a3debf000-7f5a3dec0000 rw-p 00017000 08:06 130835 /lib64/libpthread-2.15.so 7f5a3dec0000-7f5a3dec4000 rw-p 00000000 00:00 0 7f5a3dec4000-7f5a3ded9000 r-xp 00000000 08:06 133636 /lib64/libgcc_s.so.1 7f5a3ded9000-7f5a3e0d8000 ---p 00015000 08:06 133636 /lib64/libgcc_s.so.1 7f5a3e0d8000-7f5a3e0d9000 r--p 00014000 08:06 133636 /lib64/libgcc_s.so.1 7f5a3e0d9000-7f5a3e0da000 rw-p 00015000 08:06 133636 /lib64/libgcc_s.so.1 7f5a3e0da000-7f5a3e1cf000 r-xp 00000000 08:06 130868 /lib64/libm-2.15.so 7f5a3e1cf000-7f5a3e3cf000 ---p 000f5000 08:06 130868 /lib64/libm-2.15.so 7f5a3e3cf000-7f5a3e3d0000 r--p 000f5000 08:06 130868 /lib64/libm-2.15.so 7f5a3e3d0000-7f5a3e3d1000 rw-p 000f6000 08:06 130868 /lib64/libm-2.15.so 7f5a3e3d1000-7f5a3e4b9000 r-xp 00000000 08:06 655189 /usr/lib64/libstdc++.so.6.0.17 7f5a3e4b9000-7f5a3e6b9000 ---p 000e8000 08:06 655189 /usr/lib64/libstdc++.so.6.0.17 7f5a3e6b9000-7f5a3e6c1000 r--p 000e8000 08:06 655189 /usr/lib64/libstdc++.so.6.0.17 7f5a3e6c1000-7f5a3e6c3000 rw-p 000f0000 08:06 655189 /usr/lib64/libstdc++.so.6.0.17 7f5a3e6c3000-7f5a3e6d8000 rw-p 00000000 00:00 0 7f5a3e6d8000-7f5a3e6f9000 r-xp 00000000 08:06 140264 /lib64/ld-2.15.so 7f5a3e7a1000-7f5a3e8a6000 rw-p 00000000 00:00 0 7f5a3e8f6000-7f5a3e8f9000 rw-p 00000000 00:00 0 7f5a3e8f9000-7f5a3e8fa000 r--p 00021000 08:06 140264 /lib64/ld-2.15.so 7f5a3e8fa000-7f5a3e8fb000 rw-p 00022000 08:06 140264 /lib64/ld-2.15.so 7f5a3e8fb000-7f5a3e8fc000 rw-p 00000000 00:00 0 7fff27325000-7fff27346000 rw-p 00000000 00:00 0 [stack] 7fff273ff000-7fff27400000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted

Gerilgfx avatar Oct 17 '13 17:10 Gerilgfx

That's very useful - it'll either be dlmalloc or my changes to dlmalloc. I'll try rewalking the code path.

ned14 avatar Oct 17 '13 19:10 ned14

pastelink.me/dl/15d838#sthash.Yt123CLX.dpuf

here is a binary compiled on my computer from this source. (i hope this crap site dont replaces it with some fake crap)

Gerilgfx avatar Oct 17 '13 21:10 Gerilgfx

I'm off to the GSoC mentors summit in California tomorrow, then onto Seattle returning in about a week. Thanks for the binaries, and I'll look into them when I get back.

ned14 avatar Oct 18 '13 01:10 ned14

Crap, sorry pastelink.me/dl/15d838#sthash.Yt123CLX.dpuf deletes its files after 7 days. I'll be at home for the next week though, definitely can run it and see what happens.

ned14 avatar Oct 28 '13 14:10 ned14

okay, leave me a message and i will reupload

Gerilgfx avatar Oct 29 '13 15:10 Gerilgfx

Message to here, or do you want me to PM you or something?

ned14 avatar Oct 30 '13 19:10 ned14

just leave message here. it notifyes me in email.

Gerilgfx avatar Oct 31 '13 14:10 Gerilgfx

btw if you use skype or IM like that, i can pick you up there too.

Gerilgfx avatar Oct 31 '13 14:10 Gerilgfx

any step forward?

Gerilgfx avatar Jan 04 '14 14:01 Gerilgfx

If you remember (see thread above) I was waiting on some precompiled binaries from you as I was unable to replicate the problem here. I needed to rule out compiler/platform differences.

Note that currently everything I own is in a container being shipped from Canada to Ireland, and so any ability to run anything will be delayed until the container arrives in February. In particular, right now my access to Linux is very restricted, but I may be able to borrow time on someone's server.

ned14 avatar Jan 04 '14 17:01 ned14

i have sent it to your mail account back then. it seems you havent recived it. i will recompile them then and upload somewhere.

Gerilgfx avatar Jan 05 '14 17:01 Gerilgfx

http://www.sendspace.com/file/wae5fj (click on ,,Click here to start download from sendspace'')

Gerilgfx avatar Jan 05 '14 18:01 Gerilgfx

I have the file, I'll see if I can arrange access to a Linux box. Thanks Geri.

ned14 avatar Jan 05 '14 18:01 ned14

okay, i am curious to see if it crashes or not.

Gerilgfx avatar Jan 05 '14 19:01 Gerilgfx