Arun Balakrishnan (WT01 - Computing, Storage & Software Products)
2008-Feb-23 12:14 UTC
Memory Leak under FreeBSD 6.0 RELEASE
Hi, We are currently working on a project wherein we are porting a library from GNU/Linux to FreeBSD 6.0 - RELEASE 32-bit and 64-bit. As part of the standard memory leak tests, we noticed that the ported library is leaking memory. After lots of analysis we found something very strange. Just repeatedly loading and unloading our library was itself throwing up a leak. We are able to reproduce a similar leak using the following steps: 1. SimpleLib.cpp - Simple dummy library 2. LibLoader.cpp - Utility to repeatedly load the library 3. Compile as mentioned 4. Run under Valgrind for multiple times (31 times in our example. Hard coded for simpilicity) =================SimpleLib.cpp================== #include <stdio.h> #include <stdlib.h> class CLeaker { public: CLeaker() { }; virtual ~CLeaker() { }; }; CLeaker obj; ================LibLoader.cpp===================== #include "stdio.h" #include "dlfcn.h" #include <stdlib.h> #include <unistd.h> #include <sys/time.h> int main() { int i = 0; int loop = 31; while (i<loop) { i++; void *handle = dlopen(argv[1], RTLD_LAZY); if ( !handle ) exit(1); dlclose(handle); } return 0; } ===================================================================== = Compilation: g++ -shared -Wl,-soname,SimpleLib.so -o SimpleLib.so SimpleLib.cpp -g g++ -o LibLoader_FreeBSD LibLoader.cpp -g ===================================================================== == Execution: valgrind --trace-pthread=all --show-below-main=yes --show-reachable=yes --leak-check=yes ./LibLoader_FreeBSD ./SimpleLib.so ===================================================================== == Output: (snipped off irrelevant portions) ==1155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==1155== malloc/free: in use at exit: 520 bytes in 1 blocks. ==1155== malloc/free: 1 allocs, 0 frees, 520 bytes allocated. ==1155== For counts of detected errors, rerun with: -v ==1155== searching for pointers to 1 not-freed blocks. ==1155== checked 2140912 bytes. ==1155= ==1155== 520 bytes in 1 blocks are still reachable in loss record 1 of 1 ==1155== at 0x3C032183: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck.so) ==1155== by 0x3C1CB018: (within /lib/libc.so.6) ==1155== by 0x3C1CB206: __cxa_atexit (in /lib/libc.so.6) ==1155== by 0x3C1F0898: ??? ==1155= ==1155== LEAK SUMMARY: ==1155== definitely lost: 0 bytes in 0 blocks. ==1155== possibly lost: 0 bytes in 0 blocks. ==1155== still reachable: 520 bytes in 1 blocks. ==1155== suppressed: 0 bytes in 0 blocks. ===================================================================== == Queries: 1. As seen in the Valgrind output, there is a 520bytes leak. This happens only after around 31 loops and keeps increasing. By 100 loops, the leak goes up to 1560 bytes. In our situation with our library, the 520bytes leak starts by the third iteration itself and by around 23 iterations it reaches 5KB. We are really stumped as to what could be the possible reason for this leak? Where is the malloc called from? Why only after executing 31 times? Executing the same code under GNU/Linux does not show any leak even for over 1000 iterations. 2. While executing this without Valgrind, in another terminal we did a "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that the RSS (resident set size) field value keeps increasing by 4KB every now and then. The same experiment on GNU/Linux shows that RSS remains at the same value. What could be the cause for the ever rising RSS value? Any help in this regard would be really helpful. Thanks in advance. Rgds, ~Arun The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
On Sat, Feb 23, 2008 at 05:27:15PM +0530, Arun Balakrishnan (WT01 - Computing, Storage & Software Products) wrote:> > Hi, > We are currently working on a project wherein we are porting a library > from GNU/Linux to FreeBSD 6.0 - RELEASE 32-bit and 64-bit. As part of > the standard memory leak tests, we noticed that the ported library is > leaking memory. After lots of analysis we found something very > strange. Just repeatedly loading and unloading our library was itself > throwing up a leak. We are able to reproduce a similar leak using the > following steps: > 1. SimpleLib.cpp - Simple dummy library > 2. LibLoader.cpp - Utility to repeatedly load the library > 3. Compile as mentioned > 4. Run under Valgrind for multiple times (31 times in our example. > Hard coded for simpilicity) > =================SimpleLib.cpp==================> #include <stdio.h> > #include <stdlib.h> > class CLeaker > { > public: > CLeaker() { }; > virtual ~CLeaker() { }; > }; > CLeaker obj; > ================LibLoader.cpp=====================> #include "stdio.h" > #include "dlfcn.h" > #include <stdlib.h> > #include <unistd.h> > #include <sys/time.h> > int main() > { > int i = 0; > int loop = 31; > while (i<loop) > { > i++; > void *handle = dlopen(argv[1], RTLD_LAZY); > if ( !handle ) > exit(1); > dlclose(handle); > } > return 0; > } > =====================================================================> => Compilation: > g++ -shared -Wl,-soname,SimpleLib.so -o SimpleLib.so SimpleLib.cpp -g > g++ -o LibLoader_FreeBSD LibLoader.cpp -g > =====================================================================> ==> Execution: > valgrind --trace-pthread=all --show-below-main=yes > --show-reachable=yes --leak-check=yes ./LibLoader_FreeBSD > ./SimpleLib.so > =====================================================================> ==> Output: (snipped off irrelevant portions) > ==1155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from > 0) > ==1155== malloc/free: in use at exit: 520 bytes in 1 blocks. > ==1155== malloc/free: 1 allocs, 0 frees, 520 bytes allocated. > ==1155== For counts of detected errors, rerun with: -v > ==1155== searching for pointers to 1 not-freed blocks. > ==1155== checked 2140912 bytes. > ==1155=> ==1155== 520 bytes in 1 blocks are still reachable in loss record 1 of > 1 > ==1155== at 0x3C032183: malloc (in > /usr/local/lib/valgrind/vgpreload_memcheck.so) > ==1155== by 0x3C1CB018: (within /lib/libc.so.6) > ==1155== by 0x3C1CB206: __cxa_atexit (in /lib/libc.so.6) > ==1155== by 0x3C1F0898: ??? > ==1155=> ==1155== LEAK SUMMARY: > ==1155== definitely lost: 0 bytes in 0 blocks. > ==1155== possibly lost: 0 bytes in 0 blocks. > ==1155== still reachable: 520 bytes in 1 blocks. > ==1155== suppressed: 0 bytes in 0 blocks. > =====================================================================> ==> Queries: > 1. As seen in the Valgrind output, there is a 520bytes leak. This > happens only after around 31 loops and keeps increasing. By 100 loops, > the leak goes up to 1560 bytes. In our situation with our library, the > 520bytes leak starts by the third iteration itself and by around 23 > iterations it reaches 5KB. We are really stumped as to what could be > the possible reason for this leak? Where is the malloc called from? > Why only after executing 31 times? Executing the same code under > GNU/Linux does not show any leak even for over 1000 iterations. > 2. While executing this without Valgrind, in another terminal we did a > "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that > the RSS (resident set size) field value keeps increasing by 4KB every > now and then. The same experiment on GNU/Linux shows that RSS remains > at the same value. What could be the cause for the ever rising RSS > value? > Any help in this regard would be really helpful. Thanks in advance. > Rgds, > ~ArunThe valgrind report points to memory used by the atexit_register() for keeping the information on the functions registered by means of atexit(3) and __cxa_atexit(). See the lib/libc/stdlib/atexit.c. In your (non-compilable) example, __cxa_atexit() is used by shared objects to register the destructor for global objects to be called at the dso unload. The handling of the memory is complicated because atexit() specification states that: - functions shall be called in the reverse order of their registration; - at least 32 functions can be registered with atexit(). The current implementation never frees the struct atexit to try to conform to the requirement of order. The static __atexit0, intended to guarantee success of the first 32 atexit() calls, may not guarantee it, because the space can be consumed by the interleaved __cxa_atexit() instead. Patch below may help with the libc leak. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 05dad84..8389637 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD: src/lib/libc/stdlib/atexit.c,v 1.8 2007/01/09 00:28:09 imp E #include <stdlib.h> #include <unistd.h> #include <pthread.h> +#include <sys/queue.h> #include "atexit.h" #include "un-namespace.h" @@ -56,7 +57,7 @@ static pthread_mutex_t atexit_mutex = PTHREAD_MUTEX_INITIALIZER; #define _MUTEX_UNLOCK(x) if (__isthreaded) _pthread_mutex_unlock(x) struct atexit { - struct atexit *next; /* next in list */ + LIST_ENTRY(atexit) link; int ind; /* next index in this table */ struct atexit_fn { int fn_type; /* ATEXIT_? from above */ @@ -69,7 +70,10 @@ struct atexit { } fns[ATEXIT_SIZE]; /* the table itself */ }; -static struct atexit *__atexit; /* points to head of LIFO stack */ +/* Head of LIFO stack */ +LIST_HEAD(, atexit) __atexit = LIST_HEAD_INITIALIZER(__atexit); +static struct atexit __atexit0; /* one guaranteed table */ +static unsigned long __atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -79,30 +83,33 @@ static struct atexit *__atexit; /* points to head of LIFO stack */ static int atexit_register(struct atexit_fn *fptr) { - static struct atexit __atexit0; /* one guaranteed table */ struct atexit *p; + unsigned long old__atexit_gen; _MUTEX_LOCK(&atexit_mutex); - if ((p = __atexit) == NULL) - __atexit = p = &__atexit0; - else while (p->ind >= ATEXIT_SIZE) { - struct atexit *old__atexit; - old__atexit = __atexit; - _MUTEX_UNLOCK(&atexit_mutex); - if ((p = (struct atexit *)malloc(sizeof(*p))) == NULL) - return (-1); - _MUTEX_LOCK(&atexit_mutex); - if (old__atexit != __atexit) { - /* Lost race, retry operation */ + if (LIST_EMPTY(&__atexit)) { + p = &__atexit0; + LIST_INSERT_HEAD(&__atexit, p, link); + } else { + retry: + p = LIST_FIRST(&__atexit); + if (p->ind >= ATEXIT_SIZE) { + old__atexit_gen = __atexit_gen; _MUTEX_UNLOCK(&atexit_mutex); - free(p); + if ((p = (struct atexit *)malloc(sizeof(*p))) == NULL) + return (-1); _MUTEX_LOCK(&atexit_mutex); - p = __atexit; - continue; + if (old__atexit_gen != __atexit_gen) { + /* Lost race, retry operation */ + _MUTEX_UNLOCK(&atexit_mutex); + free(p); + _MUTEX_LOCK(&atexit_mutex); + goto retry; + } + p->ind = 0; + LIST_INSERT_HEAD(&__atexit, p, link); + __atexit_gen++; } - p->ind = 0; - p->next = __atexit; - __atexit = p; } p->fns[p->ind++] = *fptr; _MUTEX_UNLOCK(&atexit_mutex); @@ -119,7 +126,7 @@ atexit(void (*func)(void)) int error; fn.fn_type = ATEXIT_FN_STD; - fn.fn_ptr.std_func = func;; + fn.fn_ptr.std_func = func; fn.fn_arg = NULL; fn.fn_dso = NULL; @@ -138,7 +145,7 @@ __cxa_atexit(void (*func)(void *), void *arg, void *dso) int error; fn.fn_type = ATEXIT_FN_CXA; - fn.fn_ptr.cxa_func = func;; + fn.fn_ptr.cxa_func = func; fn.fn_arg = arg; fn.fn_dso = dso; @@ -154,32 +161,55 @@ __cxa_atexit(void (*func)(void *), void *arg, void *dso) void __cxa_finalize(void *dso) { - struct atexit *p; - struct atexit_fn fn; - int n; + struct atexit *p, *p1, cp; + struct atexit_fn *fn; + int i, n, inuse; + unsigned long orig__atexit_gen; _MUTEX_LOCK(&atexit_mutex); - for (p = __atexit; p; p = p->next) { + restart: + inuse = 0; + LIST_FOREACH_SAFE(p, &__atexit, link, p1) { + cp.ind = 0; for (n = p->ind; --n >= 0;) { if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) continue; /* already been called */ - if (dso != NULL && dso != p->fns[n].fn_dso) + if (dso != NULL && dso != p->fns[n].fn_dso) { + inuse = 1; continue; /* wrong DSO */ - fn = p->fns[n]; + } + cp.fns[cp.ind++] = p->fns[n]; /* Mark entry to indicate that this particular handler has already been called. */ p->fns[n].fn_type = ATEXIT_FN_EMPTY; - _MUTEX_UNLOCK(&atexit_mutex); - + } + if (!inuse && p != &__atexit0) { + LIST_REMOVE(p, link); + __atexit_gen++; + } else { + /* + * The current entry cannot be removed, and so + * any consequent entries. + */ + inuse = 1; + p = NULL; + } + orig__atexit_gen = __atexit_gen; + _MUTEX_UNLOCK(&atexit_mutex); + free(p); + for (i = 0; i < cp.ind; i++) { + fn = &cp.fns[i]; /* Call the function of correct type. */ - if (fn.fn_type == ATEXIT_FN_CXA) - fn.fn_ptr.cxa_func(fn.fn_arg); - else if (fn.fn_type == ATEXIT_FN_STD) - fn.fn_ptr.std_func(); - _MUTEX_LOCK(&atexit_mutex); + if (fn->fn_type == ATEXIT_FN_CXA) + fn->fn_ptr.cxa_func(fn->fn_arg); + else if (fn->fn_type == ATEXIT_FN_STD) + fn->fn_ptr.std_func(); } + _MUTEX_LOCK(&atexit_mutex); + if (orig__atexit_gen != __atexit_gen) + goto restart; } _MUTEX_UNLOCK(&atexit_mutex); } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080223/39efa026/attachment.pgp
My previous mail was this: --------------------------------------------------------------------- Wow! Thanks a lot for the reply. The patch you provided really gave some insight on the underlying problem. In our final product, the library will be loaded and used only once per instance. However the automated test suites for the library, do this for over 2000 times as part of functional testing and memory leak testing. In this we were getting a huge leak on FreeBSD. One more question though. (Second one in the list of queries I had posted in my first mail.) 2. While executing this without Valgrind, in another terminal we did a "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that the RSS (resident set size) field value keeps increasing by 4KB every now and then. The same experiment on GNU/Linux shows that RSS remains at the same value. What could be the cause for the ever rising RSS value? Could you throw some light on what could be the possible reason for this? Is RSS value directly mappable to the leak that we see in libc? This is another issue that is acting as a show stopper for us. --------------------------------------------------------------------- Thanks again, ~Arun Kostik Belousov wrote:> On Mon, Feb 25, 2008 at 10:54:45AM +0530, Arun Balakrishnan wrote: > > I am unable to reply to HTML mail. Please, repost it with plain text > content.The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
On Tue, Feb 26, 2008 at 10:10:14AM +0530, Arun Balakrishnan wrote:> My previous mail was this: > > --------------------------------------------------------------------- > > Wow! Thanks a lot for the reply. The patch you provided really gave > some insight on the underlying problem. > > In our final product, the library will be loaded and used only once > per instance. However the automated test suites for the library, do this > for over 2000 times as part of functional testing and memory leak > testing. In this we were getting a huge leak on FreeBSD. > > One more question though. (Second one in the list of queries I had > posted in my first mail.) > > 2. While executing this without Valgrind, in another terminal we did a > "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that > the RSS (resident set size) field value keeps increasing by 4KB every > now and then. The same experiment on GNU/Linux shows that RSS remains > at the same value. What could be the cause for the ever rising RSS > value? > > Could you throw some light on what could be the possible reason for > this? Is RSS value directly mappable to the leak that we see in libc? > This is another issue that is acting as a show stopper for us.I did not track closely recent malloc development and improvements in the FreeBSD. In any case, I think you have some misunderstanding of the VM concepts there. From the very high level view, kernel uses physical memory to cache the virtual memory content. There, RSS approximately shows amount of the cache used by the process. Change of the RSS size over time may be caused by a lot of reasons, in particular, process working set changes over time, load on the system, kernel VM algorithms etc. The overall direction is that, on the system with negligible load except observed process and enough physical memory, the RSS would be approximately equial to the optimal process working set. The leak then would definitely increase an amount of physical memory allocated for the process. I would not put much attention to the RSS alone. Did you tested the patch ? What was the behaviour with the patch applied ?> > --------------------------------------------------------------------- > > Thanks again, > ~Arun > > Kostik Belousov wrote: > >On Mon, Feb 25, 2008 at 10:54:45AM +0530, Arun Balakrishnan wrote: > > > >I am unable to reply to HTML mail. Please, repost it with plain text > >content. > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of viruses. > The company accepts no liability for any damage caused by any virus > transmitted by this email. > > www.wipro.com-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080226/5f8894f3/attachment.pgp