J.J. Shore
2006-Oct-30 16:59 UTC
[dtrace-discuss] Why does gethostbyname_r appear to leak?
I am running a very simple multithreaded program (TestThread.C) which calls gethostbyname_r in several threads. My analysis of this program with both truss and DTrace suggest that is has a small leak. However, if I alter the program to have many more threads and run for a lot longer it never runs out of memory and does not carry on growing. Can anybody explain what I am missing in my analysis or if there are some bugs in both Dtrace and truss that I am unaware of? How to repeat my truss test. 1. Take the attached tar file and place in a directory and unpack. prompt> tar xvf leak.tar 2. Run ctest.ksh prompt> ctest.ksh 3. Look at truss.final The ctest.ksh script compiles the program and then runs it through truss to find all the places where memory is allocated and released. This output is then filtered down to just libc calls and the results are stored in truss.libcalls. truss.libcalls is then filtered to give a list of light weight processes (excluding thread 1 which is the main program). For each lwp the script greps out all the returns from malloc and prints the return value. Finally for each of these addresses a search is made for the last occurance of that address in the original truss output. This is written to the file truss.final if it is not a call to free. Invariably truss.final contains a list showing that the last operation on several addresses is a malloc or calloc rather than a free as expected which suggests that it is leaking. How to repeat my dtrace test. 1. Run leak in the background. prompt> leak & 509 prompt> 2. [u]Within 10 seconds[/u] start watch_malloc_sizes.d passing in the pid of leak and a time interval. Thus:- prompt> ./watch_malloc_sizes.d 509 5s where 509 is the PID taken from above. The DTrace script is trying to match allocated memory with a call to free in any other thread. Every <interval> seconds it displays an interim report and finally when the program finished is provides an overall summary. On my machine it suggests in the same way as truss does that the program has a leak. I am not convinced that the case is so straight forward partly because when I extend the program into an eternal loop with many more threads it does not run out of memory and partly because the list of threads that leak seems to vary which suggests that some events are somehow being missed by truss and dtrace. If the program is made single threaded then there are no leaks. I would like to know what really is going on so if you have read this far thanks for you time. This message posted from opensolaris.org
Sean McGrath - Sun Microsystems Ireland
2006-Oct-30 19:06 UTC
[dtrace-discuss] Why does gethostbyname_r appear to leak?
J.J. Shore stated: < I am running a very simple multithreaded program (TestThread.C) which calls gethostbyname_r in several threads. My analysis of this program with both truss and DTrace suggest that is has a small leak. However, if I alter the program to have many more threads and run for a lot longer it never runs out of memory and does not carry on growing. Can anybody explain what I am missing in my analysis or if there are some bugs in both Dtrace and truss that I am unaware of? < < How to repeat my truss test. < < 1. Take the attached tar file and place in a directory and unpack. < prompt> tar xvf leak.tar < 2. Run ctest.ksh < prompt> ctest.ksh < 3. Look at truss.final Seems the mailing list software stripped the attachment ? What version of Solaris Express or Update is this ? I know there was a bug fix that went into a recent Solaris Express build that fixed a leak from gethostbyname_r. Sean. . < < The ctest.ksh script compiles the program and then runs it through truss to find all the places where memory is allocated and released. This output is then filtered down to just libc calls and the results are stored in truss.libcalls. truss.libcalls is then filtered to give a list of light weight processes (excluding thread 1 which is the main program). For each lwp the script greps out all the returns from malloc and prints the return value. Finally for each of these addresses a search is made for the last occurance of that address in the original truss output. This is written to the file truss.final if it is not a call to free. Invariably truss.final contains a list showing that the last operation on several addresses is a malloc or calloc rather than a free as expected which suggests that it is leaking. < < How to repeat my dtrace test. < < 1. Run leak in the background. < prompt> leak & < 509 < prompt> < < 2. [u]Within 10 seconds[/u] start watch_malloc_sizes.d passing in the pid of leak and < a time interval. Thus:- < prompt> ./watch_malloc_sizes.d 509 5s < where 509 is the PID taken from above. < < The DTrace script is trying to match allocated memory with a call to free in any other thread. Every <interval> seconds it displays an interim report and finally when the program finished is provides an overall summary. On my machine it suggests in the same way as truss does that the program has a leak. < < I am not convinced that the case is so straight forward partly because when I extend the program into an eternal loop with many more threads it does not run out of memory and partly because the list of threads that leak seems to vary which suggests that some events are somehow being missed by truss and dtrace. If the program is made single threaded then there are no leaks. < < I would like to know what really is going on so if you have read this far thanks for you time. < < < This message posted from opensolaris.org < _______________________________________________ < dtrace-discuss mailing list < dtrace-discuss at opensolaris.org -- Sean. .
J.J. Shore
2006-Oct-31 08:40 UTC
[dtrace-discuss] Re: Why does gethostbyname_r appear to leak?
I see that attachments do not get copied over when cross threading and that this tool will not let me attach stuff to a CC thread. Anyway the attachment can be viewed from http://www.opensolaris.org/jive/thread.jspa?threadID=16436&tstart=0 I am using the following Solaris version:- hawea> uname -a SunOS hawea 5.10 Generic_118822-25 sun4u sparc SUNW,Sun-Fire-V440 This message posted from opensolaris.org
J.J. Shore
2006-Oct-31 11:11 UTC
[dtrace-discuss] Re: Why does gethostbyname_r appear to leak?
Following up a suggestion I have received by e-mail I have tried using libumem, as follows, to see if there is a leak. The result suggests there is not a leak. i. In one terminal:- export UMEM_DEBUG=default; export UMEM_LOGGING=transaction; export LD_PRELOAD=libumem.so.1; leak ii. In a seperate terminal gcore $(pgrep leak) mdb core.xxx ::findleaks CACHE LEAKED BUFCTL CALLER ---------------------------------------------------------------------- Total 0 buffers, 0 bytes I modified the program have an eternal loop before doing this. This message posted from opensolaris.org