Arun Balakrishnan (WT01 - Computing, Storage & Software Products)
2008-Feb-23  12:14 UTC
Memory Leak under FreeBSD 6.0 RELEASE
Hi,
   We are currently working on a project wherein we are porting a library
   from GNU/Linux to FreeBSD 6.0 - RELEASE 32-bit and 64-bit. As part of
   the standard memory leak tests, we noticed that the ported library is
   leaking memory. After lots of analysis we found something very
   strange. Just repeatedly loading and unloading our library was itself
   throwing up a leak. We are able to reproduce a similar leak using the
   following steps:
   1. SimpleLib.cpp - Simple dummy library
   2. LibLoader.cpp - Utility to repeatedly load the library
   3. Compile as mentioned
   4. Run under Valgrind for multiple times (31 times in our example.
   Hard coded for simpilicity)
   =================SimpleLib.cpp==================   #include <stdio.h>
   #include <stdlib.h>
   class CLeaker
   {
   public:
     CLeaker() { };
     virtual ~CLeaker() { };
   };
   CLeaker obj;
   ================LibLoader.cpp=====================   #include
"stdio.h"
   #include "dlfcn.h"
   #include <stdlib.h>
   #include <unistd.h>
   #include <sys/time.h>
   int main()
   {
     int i = 0;
     int loop = 31;
     while (i<loop)
       {
         i++;
         void *handle = dlopen(argv[1], RTLD_LAZY);
         if ( !handle )
           exit(1);
         dlclose(handle);
       }
     return 0;
   }
   =====================================================================   =  
Compilation:
   g++ -shared -Wl,-soname,SimpleLib.so -o SimpleLib.so SimpleLib.cpp -g
   g++ -o LibLoader_FreeBSD LibLoader.cpp -g
   =====================================================================   ==  
Execution:
   valgrind --trace-pthread=all --show-below-main=yes
   --show-reachable=yes --leak-check=yes ./LibLoader_FreeBSD
   ./SimpleLib.so
   =====================================================================   ==  
Output: (snipped off irrelevant portions)
   ==1155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from
   0)
   ==1155== malloc/free: in use at exit: 520 bytes in 1 blocks.
   ==1155== malloc/free: 1 allocs, 0 frees, 520 bytes allocated.
   ==1155== For counts of detected errors, rerun with: -v
   ==1155== searching for pointers to 1 not-freed blocks.
   ==1155== checked 2140912 bytes.
   ==1155=   ==1155== 520 bytes in 1 blocks are still reachable in loss record 1
of
   1
   ==1155==    at 0x3C032183: malloc (in
   /usr/local/lib/valgrind/vgpreload_memcheck.so)
   ==1155==    by 0x3C1CB018: (within /lib/libc.so.6)
   ==1155==    by 0x3C1CB206: __cxa_atexit (in /lib/libc.so.6)
   ==1155==    by 0x3C1F0898: ???
   ==1155=   ==1155== LEAK SUMMARY:
   ==1155==    definitely lost: 0 bytes in 0 blocks.
   ==1155==    possibly lost:   0 bytes in 0 blocks.
   ==1155==    still reachable: 520 bytes in 1 blocks.
   ==1155==         suppressed: 0 bytes in 0 blocks.
   =====================================================================   ==  
Queries:
   1. As seen in the Valgrind output, there is a 520bytes leak. This
   happens only after around 31 loops and keeps increasing. By 100 loops,
   the leak goes up to 1560 bytes. In our situation with our library, the
   520bytes leak starts by the third iteration itself and by around 23
   iterations it reaches 5KB. We are really stumped as to what could be
   the possible reason for this leak? Where is the malloc called from?
   Why only after executing 31 times? Executing the same code under
   GNU/Linux does not show any leak even for over 1000 iterations.
   2. While executing this without Valgrind, in another terminal we did a
   "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw
that
   the RSS (resident set size) field value keeps increasing by 4KB every
   now and then. The same experiment on GNU/Linux shows that RSS remains
   at the same value. What could be the cause for the ever rising RSS
   value?
   Any help in this regard would be really helpful. Thanks in advance.
   Rgds,
   ~Arun
   The information contained in this electronic message and any
   attachments to this message are intended for the exclusive use of the
   addressee(s) and may contain proprietary, confidential or privileged
   information. If you are not the intended recipient, you should not
   disseminate, distribute or copy this e-mail. Please notify the sender
   immediately and destroy all copies of this message and any
   attachments.
   WARNING: Computer viruses can be transmitted via email. The recipient
   should check this email and any attachments for the presence of
   viruses. The company accepts no liability for any damage caused by any
   virus transmitted by this email.
   www.wipro.com
On Sat, Feb 23, 2008 at 05:27:15PM +0530, Arun Balakrishnan (WT01 - Computing, Storage & Software Products) wrote:> > Hi, > We are currently working on a project wherein we are porting a library > from GNU/Linux to FreeBSD 6.0 - RELEASE 32-bit and 64-bit. As part of > the standard memory leak tests, we noticed that the ported library is > leaking memory. After lots of analysis we found something very > strange. Just repeatedly loading and unloading our library was itself > throwing up a leak. We are able to reproduce a similar leak using the > following steps: > 1. SimpleLib.cpp - Simple dummy library > 2. LibLoader.cpp - Utility to repeatedly load the library > 3. Compile as mentioned > 4. Run under Valgrind for multiple times (31 times in our example. > Hard coded for simpilicity) > =================SimpleLib.cpp==================> #include <stdio.h> > #include <stdlib.h> > class CLeaker > { > public: > CLeaker() { }; > virtual ~CLeaker() { }; > }; > CLeaker obj; > ================LibLoader.cpp=====================> #include "stdio.h" > #include "dlfcn.h" > #include <stdlib.h> > #include <unistd.h> > #include <sys/time.h> > int main() > { > int i = 0; > int loop = 31; > while (i<loop) > { > i++; > void *handle = dlopen(argv[1], RTLD_LAZY); > if ( !handle ) > exit(1); > dlclose(handle); > } > return 0; > } > =====================================================================> => Compilation: > g++ -shared -Wl,-soname,SimpleLib.so -o SimpleLib.so SimpleLib.cpp -g > g++ -o LibLoader_FreeBSD LibLoader.cpp -g > =====================================================================> ==> Execution: > valgrind --trace-pthread=all --show-below-main=yes > --show-reachable=yes --leak-check=yes ./LibLoader_FreeBSD > ./SimpleLib.so > =====================================================================> ==> Output: (snipped off irrelevant portions) > ==1155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from > 0) > ==1155== malloc/free: in use at exit: 520 bytes in 1 blocks. > ==1155== malloc/free: 1 allocs, 0 frees, 520 bytes allocated. > ==1155== For counts of detected errors, rerun with: -v > ==1155== searching for pointers to 1 not-freed blocks. > ==1155== checked 2140912 bytes. > ==1155=> ==1155== 520 bytes in 1 blocks are still reachable in loss record 1 of > 1 > ==1155== at 0x3C032183: malloc (in > /usr/local/lib/valgrind/vgpreload_memcheck.so) > ==1155== by 0x3C1CB018: (within /lib/libc.so.6) > ==1155== by 0x3C1CB206: __cxa_atexit (in /lib/libc.so.6) > ==1155== by 0x3C1F0898: ??? > ==1155=> ==1155== LEAK SUMMARY: > ==1155== definitely lost: 0 bytes in 0 blocks. > ==1155== possibly lost: 0 bytes in 0 blocks. > ==1155== still reachable: 520 bytes in 1 blocks. > ==1155== suppressed: 0 bytes in 0 blocks. > =====================================================================> ==> Queries: > 1. As seen in the Valgrind output, there is a 520bytes leak. This > happens only after around 31 loops and keeps increasing. By 100 loops, > the leak goes up to 1560 bytes. In our situation with our library, the > 520bytes leak starts by the third iteration itself and by around 23 > iterations it reaches 5KB. We are really stumped as to what could be > the possible reason for this leak? Where is the malloc called from? > Why only after executing 31 times? Executing the same code under > GNU/Linux does not show any leak even for over 1000 iterations. > 2. While executing this without Valgrind, in another terminal we did a > "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that > the RSS (resident set size) field value keeps increasing by 4KB every > now and then. The same experiment on GNU/Linux shows that RSS remains > at the same value. What could be the cause for the ever rising RSS > value? > Any help in this regard would be really helpful. Thanks in advance. > Rgds, > ~ArunThe valgrind report points to memory used by the atexit_register() for keeping the information on the functions registered by means of atexit(3) and __cxa_atexit(). See the lib/libc/stdlib/atexit.c. In your (non-compilable) example, __cxa_atexit() is used by shared objects to register the destructor for global objects to be called at the dso unload. The handling of the memory is complicated because atexit() specification states that: - functions shall be called in the reverse order of their registration; - at least 32 functions can be registered with atexit(). The current implementation never frees the struct atexit to try to conform to the requirement of order. The static __atexit0, intended to guarantee success of the first 32 atexit() calls, may not guarantee it, because the space can be consumed by the interleaved __cxa_atexit() instead. Patch below may help with the libc leak. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 05dad84..8389637 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD: src/lib/libc/stdlib/atexit.c,v 1.8 2007/01/09 00:28:09 imp E #include <stdlib.h> #include <unistd.h> #include <pthread.h> +#include <sys/queue.h> #include "atexit.h" #include "un-namespace.h" @@ -56,7 +57,7 @@ static pthread_mutex_t atexit_mutex = PTHREAD_MUTEX_INITIALIZER; #define _MUTEX_UNLOCK(x) if (__isthreaded) _pthread_mutex_unlock(x) struct atexit { - struct atexit *next; /* next in list */ + LIST_ENTRY(atexit) link; int ind; /* next index in this table */ struct atexit_fn { int fn_type; /* ATEXIT_? from above */ @@ -69,7 +70,10 @@ struct atexit { } fns[ATEXIT_SIZE]; /* the table itself */ }; -static struct atexit *__atexit; /* points to head of LIFO stack */ +/* Head of LIFO stack */ +LIST_HEAD(, atexit) __atexit = LIST_HEAD_INITIALIZER(__atexit); +static struct atexit __atexit0; /* one guaranteed table */ +static unsigned long __atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -79,30 +83,33 @@ static struct atexit *__atexit; /* points to head of LIFO stack */ static int atexit_register(struct atexit_fn *fptr) { - static struct atexit __atexit0; /* one guaranteed table */ struct atexit *p; + unsigned long old__atexit_gen; _MUTEX_LOCK(&atexit_mutex); - if ((p = __atexit) == NULL) - __atexit = p = &__atexit0; - else while (p->ind >= ATEXIT_SIZE) { - struct atexit *old__atexit; - old__atexit = __atexit; - _MUTEX_UNLOCK(&atexit_mutex); - if ((p = (struct atexit *)malloc(sizeof(*p))) == NULL) - return (-1); - _MUTEX_LOCK(&atexit_mutex); - if (old__atexit != __atexit) { - /* Lost race, retry operation */ + if (LIST_EMPTY(&__atexit)) { + p = &__atexit0; + LIST_INSERT_HEAD(&__atexit, p, link); + } else { + retry: + p = LIST_FIRST(&__atexit); + if (p->ind >= ATEXIT_SIZE) { + old__atexit_gen = __atexit_gen; _MUTEX_UNLOCK(&atexit_mutex); - free(p); + if ((p = (struct atexit *)malloc(sizeof(*p))) == NULL) + return (-1); _MUTEX_LOCK(&atexit_mutex); - p = __atexit; - continue; + if (old__atexit_gen != __atexit_gen) { + /* Lost race, retry operation */ + _MUTEX_UNLOCK(&atexit_mutex); + free(p); + _MUTEX_LOCK(&atexit_mutex); + goto retry; + } + p->ind = 0; + LIST_INSERT_HEAD(&__atexit, p, link); + __atexit_gen++; } - p->ind = 0; - p->next = __atexit; - __atexit = p; } p->fns[p->ind++] = *fptr; _MUTEX_UNLOCK(&atexit_mutex); @@ -119,7 +126,7 @@ atexit(void (*func)(void)) int error; fn.fn_type = ATEXIT_FN_STD; - fn.fn_ptr.std_func = func;; + fn.fn_ptr.std_func = func; fn.fn_arg = NULL; fn.fn_dso = NULL; @@ -138,7 +145,7 @@ __cxa_atexit(void (*func)(void *), void *arg, void *dso) int error; fn.fn_type = ATEXIT_FN_CXA; - fn.fn_ptr.cxa_func = func;; + fn.fn_ptr.cxa_func = func; fn.fn_arg = arg; fn.fn_dso = dso; @@ -154,32 +161,55 @@ __cxa_atexit(void (*func)(void *), void *arg, void *dso) void __cxa_finalize(void *dso) { - struct atexit *p; - struct atexit_fn fn; - int n; + struct atexit *p, *p1, cp; + struct atexit_fn *fn; + int i, n, inuse; + unsigned long orig__atexit_gen; _MUTEX_LOCK(&atexit_mutex); - for (p = __atexit; p; p = p->next) { + restart: + inuse = 0; + LIST_FOREACH_SAFE(p, &__atexit, link, p1) { + cp.ind = 0; for (n = p->ind; --n >= 0;) { if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) continue; /* already been called */ - if (dso != NULL && dso != p->fns[n].fn_dso) + if (dso != NULL && dso != p->fns[n].fn_dso) { + inuse = 1; continue; /* wrong DSO */ - fn = p->fns[n]; + } + cp.fns[cp.ind++] = p->fns[n]; /* Mark entry to indicate that this particular handler has already been called. */ p->fns[n].fn_type = ATEXIT_FN_EMPTY; - _MUTEX_UNLOCK(&atexit_mutex); - + } + if (!inuse && p != &__atexit0) { + LIST_REMOVE(p, link); + __atexit_gen++; + } else { + /* + * The current entry cannot be removed, and so + * any consequent entries. + */ + inuse = 1; + p = NULL; + } + orig__atexit_gen = __atexit_gen; + _MUTEX_UNLOCK(&atexit_mutex); + free(p); + for (i = 0; i < cp.ind; i++) { + fn = &cp.fns[i]; /* Call the function of correct type. */ - if (fn.fn_type == ATEXIT_FN_CXA) - fn.fn_ptr.cxa_func(fn.fn_arg); - else if (fn.fn_type == ATEXIT_FN_STD) - fn.fn_ptr.std_func(); - _MUTEX_LOCK(&atexit_mutex); + if (fn->fn_type == ATEXIT_FN_CXA) + fn->fn_ptr.cxa_func(fn->fn_arg); + else if (fn->fn_type == ATEXIT_FN_STD) + fn->fn_ptr.std_func(); } + _MUTEX_LOCK(&atexit_mutex); + if (orig__atexit_gen != __atexit_gen) + goto restart; } _MUTEX_UNLOCK(&atexit_mutex); } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080223/39efa026/attachment.pgp
My previous mail was this:
---------------------------------------------------------------------
Wow! Thanks a lot for the reply. The patch you provided really gave 
some insight on the underlying problem.
In our final product, the library will be loaded and used only once 
per instance. However the automated test suites for the library, do this 
for over 2000 times as part of functional testing and memory leak 
testing. In this we were getting a huge leak on FreeBSD.
One more question though. (Second one in the list of queries I had 
posted in my first mail.)
2. While executing this without Valgrind, in another terminal we did a
    "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw
that
    the RSS (resident set size) field value keeps increasing by 4KB every
    now and then. The same experiment on GNU/Linux shows that RSS remains
    at the same value. What could be the cause for the ever rising RSS
    value?
Could you throw some light on what could be the possible reason for 
this? Is RSS value directly mappable to the leak that we see in libc? 
This is another issue that is acting as a show stopper for us.
---------------------------------------------------------------------
Thanks again,
~Arun
Kostik Belousov wrote:> On Mon, Feb 25, 2008 at 10:54:45AM +0530, Arun Balakrishnan wrote:
> 
> I am unable to reply to HTML mail. Please, repost it with plain text
> content.
The information contained in this electronic message and any attachments to this
message are intended for the exclusive use of the addressee(s) and may contain
proprietary, confidential or privileged information. If you are not the intended
recipient, you should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately and destroy all copies of this message and any
attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should
check this email and any attachments for the presence of viruses. The company
accepts no liability for any damage caused by any virus transmitted by this
email.
www.wipro.com
On Tue, Feb 26, 2008 at 10:10:14AM +0530, Arun Balakrishnan wrote:> My previous mail was this: > > --------------------------------------------------------------------- > > Wow! Thanks a lot for the reply. The patch you provided really gave > some insight on the underlying problem. > > In our final product, the library will be loaded and used only once > per instance. However the automated test suites for the library, do this > for over 2000 times as part of functional testing and memory leak > testing. In this we were getting a huge leak on FreeBSD. > > One more question though. (Second one in the list of queries I had > posted in my first mail.) > > 2. While executing this without Valgrind, in another terminal we did a > "ps -Aopid,rss | grep LibLoader_" continuously in a loop and saw that > the RSS (resident set size) field value keeps increasing by 4KB every > now and then. The same experiment on GNU/Linux shows that RSS remains > at the same value. What could be the cause for the ever rising RSS > value? > > Could you throw some light on what could be the possible reason for > this? Is RSS value directly mappable to the leak that we see in libc? > This is another issue that is acting as a show stopper for us.I did not track closely recent malloc development and improvements in the FreeBSD. In any case, I think you have some misunderstanding of the VM concepts there. From the very high level view, kernel uses physical memory to cache the virtual memory content. There, RSS approximately shows amount of the cache used by the process. Change of the RSS size over time may be caused by a lot of reasons, in particular, process working set changes over time, load on the system, kernel VM algorithms etc. The overall direction is that, on the system with negligible load except observed process and enough physical memory, the RSS would be approximately equial to the optimal process working set. The leak then would definitely increase an amount of physical memory allocated for the process. I would not put much attention to the RSS alone. Did you tested the patch ? What was the behaviour with the patch applied ?> > --------------------------------------------------------------------- > > Thanks again, > ~Arun > > Kostik Belousov wrote: > >On Mon, Feb 25, 2008 at 10:54:45AM +0530, Arun Balakrishnan wrote: > > > >I am unable to reply to HTML mail. Please, repost it with plain text > >content. > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of viruses. > The company accepts no liability for any damage caused by any virus > transmitted by this email. > > www.wipro.com-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080226/5f8894f3/attachment.pgp