David McDaniel
2006-Oct-24 02:37 UTC
[dtrace-discuss] How to emit associative array after ^C
Boy am I a dummy. I want to simply dump out unfreed allocations when I terminate the script. What''s the secret sauce? #!/usr/sbin/dtrace -s pid$1::MyAlloc:return { bufs[arg1] = walltimestamp; } pid$1::MyFree:entry /bufs[arg0]/ { bufs[arg0] = 0; } This message posted from opensolaris.org
Why not use purify or other memory debuggers?? Rayson On 10/23/06, David McDaniel <damcdani at cisco.com> wrote:> Boy am I a dummy. I want to simply dump out unfreed allocations when I terminate the script. What''s the secret sauce? > > #!/usr/sbin/dtrace -s > pid$1::MyAlloc:return > { > bufs[arg1] = walltimestamp; > } > pid$1::MyFree:entry > /bufs[arg0]/ > { > bufs[arg0] = 0; > } > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
singularly unhelpful answer, Rayson. One obvious response: "because dtrace(1m) is installed on my system, and purify isn''t?" David: how about an END probe? (Search the Dtrace manual for "The END Probe") Rayson Ho wrote:> Why not use purify or other memory debuggers?? > > Rayson > > > > On 10/23/06, David McDaniel <damcdani at cisco.com> wrote: >> Boy am I a dummy. I want to simply dump out unfreed allocations when I >> terminate the script. What''s the secret sauce? >> >> #!/usr/sbin/dtrace -s >> pid$1::MyAlloc:return >> { >> bufs[arg1] = walltimestamp; >> } >> pid$1::MyFree:entry >> /bufs[arg0]/ >> { >> bufs[arg0] = 0; >> } >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Bryan Cantrill
2006-Oct-24 03:52 UTC
[dtrace-discuss] How to emit associative array after ^C
On Mon, Oct 23, 2006 at 07:37:59PM -0700, David McDaniel wrote:> Boy am I a dummy. I want to simply dump out unfreed allocations when I terminate the script. What''s the secret sauce? > > #!/usr/sbin/dtrace -s > pid$1::MyAlloc:return > { > bufs[arg1] = walltimestamp; > } > pid$1::MyFree:entry > /bufs[arg0]/ > { > bufs[arg0] = 0; > }Use an aggregation instead of an associative array. On the allocation return do this: @[arg1] = sum(1) And on the free, do this: @[arg0] = sum(-1) What remains (which you can print out with printa(), or which will be dumped for you automatically) are allocations that were not freed. That said, using libumem and ::findleaks is a much, much better way to do this. Check out umem_debug(3MALLOC) and the MDB documentation... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
Well, you an do a lot of dtrace(1m), but you do need to spend time on refining your own D scripts. And yet the result may not be as good as specialized tools... If, the goal is to find memory leaks, the memory debuggers can easily tell where exactly the allocations are made, with a nice stack trace output to ease debugging. Of course, David did not mention anything about debugging memory leaks... but it never hurts to *suggest* other ways to fix a problem ;) Rayson On 10/23/06, Dan Mick <dan.mick at sun.com> wrote:> singularly unhelpful answer, Rayson. One obvious response: "because dtrace(1m) > is installed on my system, and purify isn''t?" > > David: how about an END probe? (Search the Dtrace manual for "The END Probe") > > Rayson Ho wrote: > > Why not use purify or other memory debuggers?? > > > > Rayson > > > > > > > > On 10/23/06, David McDaniel <damcdani at cisco.com> wrote: > >> Boy am I a dummy. I want to simply dump out unfreed allocations when I > >> terminate the script. What''s the secret sauce? > >> > >> #!/usr/sbin/dtrace -s > >> pid$1::MyAlloc:return > >> { > >> bufs[arg1] = walltimestamp; > >> } > >> pid$1::MyFree:entry > >> /bufs[arg0]/ > >> { > >> bufs[arg0] = 0; > >> } > >> > >> > >> This message posted from opensolaris.org > >> _______________________________________________ > >> dtrace-discuss mailing list > >> dtrace-discuss at opensolaris.org > >> > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > >
David McDaniel
2006-Oct-24 15:24 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
Good questions all. Q: Why not use Purify or the leak detector in SunStudio? A: Two reasons. 1) We dont have Purify and the the SunStudio tools (like bcheck) dont work at all with processes linked with libumem :-( Bummer, that. 2) The thing I''m looking for confounds leak detection in the classic sense in that the pointers are never lost and so the allocated memory is never reported as a leak. Q: Why not use the tools from umem? A: I think umem is about the greatest thing since sliced bread, but I havent found a way to make it really help in this case. The problem is that most allocations get freed in a timely fashion, but there is some errant code path(s?) that result in failure to free. In a long running process (intended to run 24x7x365), this results in slow inflation of the heap until the process dies. You''ll note that the thing I store is walltimestamp. Thats because the one thing we do have is detailed logs of various events. If I can determine the allocation time of a not-freed block, I can go back in the logs to that timeframe and look for unusual stimuli or sequences. So the attempt here is to tell me the points in time of everything allocated since the beginning of the run and not yet deallocated. Anything in such a list over about 5 minutes old is probably never going to get deallocated. If the umem transaction log timestamps were in epoch time I might be able to use that, but even with a very large log it wraps around pretty quickly, plus I havent figured out how to use it to find the oldest thing allocated but not freed. So what I''d really like to do is, in an END probe dump out the entire contents of the array, but since at that point I have no idea what the keys are, I dont know how to spit out the contents. Thanks for the interest. -d This message posted from opensolaris.org
Bryan Cantrill
2006-Oct-24 16:34 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
> Q: Why not use the tools from umem? > A: I think umem is about the greatest thing since sliced bread, but I havent found a way to make it really help in this case. The problem is that most allocations get freed in a timely fashion, but there is some errant code path(s?) that result in failure to free. In a long running process (intended to run 24x7x365), this results in slow inflation of the heap until the process dies. You''ll note that the thing I store is walltimestamp. Thats because the one thing we do have is detailed logs of various events. If I can determine the allocation time of a not-freed block, I can go back in the logs to that timeframe and look for unusual stimuli or sequences.I think you might be unaware of ::findleaks, which will do exactly what you want. See the excellent article by Rob Benson on this: http://access1.sun.com/techarticles/libumem.html The executive summary is to do this: 1. Link with libumem 2. Set the UMEM_DEBUG environment variable to "default" (see umem_debug(3MALLOC) for details) 3. Run your program 4. After a while (or whatever), grab a core file of the running process with gcore (see gcore(1) for details) 5. Run mdb on the core file 6. Run the ::findleaks dcmd 7. Take the leaked bufctl pointers, and use ::bufctl_audit on them When I do this and run xterm on my desktop, I see this: $ export LD_PRELOAD=libumem.so.1 $ export UMEM_DEBUG=default $ xterm & [2] 239413 $ pldd 239413 239413: xterm /lib/libumem.so.1 /lib/libsocket.so.1 /lib/libnsl.so.1 /usr/openwin/lib/libXaw.so.5 /usr/openwin/lib/libXmu.so.4 /usr/openwin/lib/libXext.so.0 /usr/openwin/lib/libXt.so.4 /usr/openwin/lib/libSM.so.6 /usr/openwin/lib/libICE.so.6 /usr/openwin/lib/libX11.so.4 /lib/libcurses.so.1 /lib/libc.so.1 $ gcore 239413 gcore: core.239413 dumped $ mdb core.239413 Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ] > ::findleaks CACHE LEAKED BUFCTL CALLER 080af290 1 0810a8b8 libX11.so.4`XGetKeyboardMapping+0x9d 080ab010 1 08104e48 libX11.so.4`XGetVisualInfo+0x3d 080a3290 1 08161b90 libXt.so.4`XtMalloc+0x26 080a3290 1 08161aa0 libXt.so.4`XtMalloc+0x26 080a3290 1 08161a28 libXt.so.4`XtMalloc+0x26 080a3290 1 08161b18 libXt.so.4`XtMalloc+0x26 080a3510 1 08158908 libXt.so.4`XtMalloc+0x26 ------------------------------------------------------------------------ Total 7 buffers, 8728 bytes > 0810a8b8::bufctl_audit ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 810a8b8 8110000 5cf80013c7b04 1 80af290 0 0 libumem.so.1`umem_cache_alloc_debug+0x14f libumem.so.1`umem_cache_alloc+0x144 libumem.so.1`umem_alloc+0xc5 libumem.so.1`malloc+0x27 libX11.so.4`XGetKeyboardMapping+0x9d VTInitModifiers+0x53 0x8062ee5 libXt.so.4`RealizeWidget+0x93 libXt.so.4`RealizeWidget+0x23e libXt.so.4`XtRealizeWidget+0xc3 VTInit+0x1c 0x8068d1d main+0x7a3 _start+0x80 And now we probably have enough information to file a bug against either XGetKeyboardMapping() or (more likely) VTInitModifiers()... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
David McDaniel
2006-Oct-24 19:31 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
Umm, maybe I wasn''t clear enough, so I ''ll try again. We''ve already used umem with great results many times. In this particular case, there are two parts to the puzzle. 1. These buffers are not reported as leaks by traditional tools or ::findleaks, because there are live references to the buffers. 2. By using umem, we already know which buffers are not freed and precisely where the allocation points are for the buffers that are not freed. What we don''t know is the precise escape scenario that results in them never being freed. In normal operation the unfreed buffer issue never comes up. What we dont know is retrospectively when those buffers were allocated, which would allow us to go back in logs to see the the scenario associated with them. As a highly simplified description of the problem space, this is a control system that communicates with tens of thousands of controlled devices. It uses udp, and message retry and recovery is performed at the application level. Commands and responses to and from the controlled devices must be sequenced by the application. In the implementation there is a linked list per controlled device that references message elements that in turn have pointers to allocated blocks containing formated messages. When a message is sent, a timer is started. If an acknowledgement is received, the element is cleaned up, else the timer expires and the message is retransmitted, etc. Some sort of rare circumstances, as yet undiscovered, lead to the buffers never being freed. That''s why having time-of-allocation is a key data point. AFAIK, that is not something I can get from umem, though it would make for a valuable RFE perhaps. Thanks for the interest and tips. -d This message posted from opensolaris.org
Bryan Cantrill
2006-Oct-24 20:44 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
On Tue, Oct 24, 2006 at 12:31:41PM -0700, David McDaniel wrote:> Umm, maybe I wasn''t clear enough, so I ''ll try again. We''ve already used umem with great results many times. > In this particular case, there are two parts to the puzzle. > 1. These buffers are not reported as leaks by traditional tools or ::findleaks, because there are live references to the buffers.Ah -- got it. Apologies for the spurious lesson on ::findleaks...> 2. By using umem, we already know which buffers are not freed and precisely where the allocation points are for the buffers that are not freed. What we don''t know is the precise escape scenario that results in them never being freed. In normal operation the unfreed buffer issue never comes up. What we dont know is retrospectively when those buffers were allocated, which would allow us to go back in logs to see the the scenario associated with them. > > As a highly simplified description of the problem space, this is a control system that communicates with tens of thousands of controlled devices. It uses udp, and message retry and recovery is performed at the application level. Commands and responses to and from the controlled devices must be sequenced by the application. In the implementation there is a linked list per controlled device that references message elements that in turn have pointers to allocated blocks containing formated messages. When a message is sent, a timer is started. If an acknowledgement is received, the element is cleaned up, else the timer expires and the message is retransmitted, etc. Some sort of rare circumstances, as yet undiscovered, lead to the buffers never being freed. That''s why having time-of-allocation is a key data point. AFAIK, that is not something I can get from umem, though it would make for a valuable RFE perhaps.Assuming you have UMEM_DEBUG set, do a "::whatis -b" on the buffer, then take the bufctl and run ::bufctl_audit on it, e.g.: > 810ee48::whatis -b 810ee48 is 810ee40+8, bufctl 810a930 allocated from umem_alloc_384 > 810a930::bufctl_audit ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 810a930 810ee40 5cf80002f7452 1 80a8c90 0 0 libumem.so.1`umem_cache_alloc_debug+0x14f libumem.so.1`umem_cache_alloc+0x180 libumem.so.1`umem_alloc+0xc5 libumem.so.1`malloc+0x27 libXt.so.4`XtMalloc+0x26 libXt.so.4`NewPerDisplay+0x1e libXt.so.4`InitPerDisplay+0x31 libXt.so.4`XtOpenDisplay+0x11f libXt.so.4`_XtAppInit+0xb9 libXt.so.4`XtOpenApplication+0x43 main+0x276 _start+0x80 That gives you the timestamp of the allocation, as well as the stack backtrace. (You could do this with DTrace too -- but this is quite a bit simpler and it''s certainly less contorted.) - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
David McDaniel
2006-Oct-24 21:04 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
Interestingly, this:> Use an aggregation instead of an associative array. On the allocation >return do this: >@[arg1] = sum(1) >And on the free, do this: >@[arg0] = sum(-1)Results in even the zero filled aggregate elements being printed. But thats good enough I think, because I can dump the output to a file and filter those out. This demonstrates what I mean: #!/usr/sbin/dtrace -s BEGIN { @bufs[1] = sum(walltimestamp/1000000); @bufs[2] = sum(1); } END {@bufs[3] = sum(walltimestamp/1000000); @bufs[2] = sum(-1); } Is there a way to explicity delete such an element? Or otherwise get rid of it? Thanks -d This message posted from opensolaris.org
David McDaniel
2006-Oct-24 22:40 UTC
[dtrace-discuss] Re: Re: How to emit associative array after ^C
Right, but unless I am misunderstanding, the TIMESTAMP is the equivalent of gethrtime(), ie a monotonically ascending counter starting at an arbitrary time in the past. In other words I dont have a good way of relating that to an absolute point in time so I can relate it to logs in network equipment, etc. What would be great would be to store epoch time somewhere in the process space at startup, like at _init time someplace in umem library space along with the then current gethrtime value and then manipulate the display per TIMESTAMP at ::bufctl_audit time. Something like that. You know what I mean. This message posted from opensolaris.org
Bryan Cantrill
2006-Oct-24 23:13 UTC
[dtrace-discuss] Re: Re: How to emit associative array after ^C
On Tue, Oct 24, 2006 at 03:40:19PM -0700, David McDaniel wrote:> Right, but unless I am misunderstanding, the TIMESTAMP is the equivalent of gethrtime(), ie a monotonically ascending counter starting at an arbitrary time in the past. In other words I dont have a good way of relating that to an absolute point in time so I can relate it to logs in network equipment, etc.Assuming that (1) the application was active at the time of the core dump and (2) you know the time that the core dump was taken, you should be able to piece it together by looking at timestamp of the last allocation. On the one hand, this is crude -- but it''s going to be crude anyway: wall time can slew around quite a bit. This should be good enough to get it to the nearest second, anyway -- and it would be a good RFE for libumem to occasionally get the walltime to allow MDB to perform this correlation more precisely... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
David McDaniel
2006-Oct-25 02:07 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
Anyway, here is what I ended up putting together. It works after a fashion, but curiousity kills me. This: # cat AllocFree.d #!/usr/sbin/dtrace -s /* usage: AllocFree.d <pid> */ BEGIN { @bufs[1] = sum(walltimestamp/1000000); } pid$1::Alloc:return { x = (walltimestamp/1000000); bufs[arg1] = x; @bufs[arg1] = sum(x); } pid$1::Free:entry /bufs[arg0]/ { y=bufs[arg0]; @bufs[arg0] = sum(-y); bufs[arg0] = 0; } END { @bufs[3] = sum(walltimestamp/1000000); } Yields something that looks like this: # AllocFree.d `pgrep zzz` dtrace: script ''AllocFree.d'' matched 4 probes ^C 345686024 0 346144776 0 346325000 0 193224712 2 345833480 2 346890248 2 345702408 3 345849864 3 345718792 27 1 1161740371111 3 1161740643626 345792520 -14 345767944 -13 193028104 -4 346316808 -4 345874440 -2 346349576 -2 193191944 -1 345817096 -1 346136584 -1 346292232 -1 346333192 -1 346800136 -1 So, the idea is that the BEGIN and END probes give me relative time markers. The lefthand value is the buffer address (in decimal) which I can feed ::whatis and go from there to get a traceback to its allocation point. Anything with a value in between "1" and "3" is currently allocated. All that works fine, but I cant for the life of me figure out where the bogus low positive and negative values are coming from. Its not a problem for me to simply toss out the items that dont pass the sniff test, but its curious. Thanks for your interest. -d This message posted from opensolaris.org
Nicolas Williams
2006-Oct-25 02:33 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
On Tue, Oct 24, 2006 at 07:07:23PM -0700, David McDaniel wrote:> So, the idea is that the BEGIN and END probes give me relative time > markers. The lefthand value is the buffer address (in decimal) which > I can feed ::whatis and go from there to get a traceback to its > allocation point. Anything with a value in between "1" and "3" is > currently allocated. All that works fine, but I cant for the life of > me figure out where the bogus low positive and negative values are > coming from. Its not a problem for me to simply toss out the items > that dont pass the sniff test, but its curious.Drops?
Casper.Dik at Sun.COM
2006-Oct-25 09:06 UTC
[dtrace-discuss] Re: How to emit associative array after ^C
>I think you might be unaware of ::findleaks, which will do exactly what >you want. See the excellent article by Rob Benson on this:The claim is that the leak confounds leak checking as it isn''t a leak in the real sense (the data is still referenced). I think he needs to be pointed to ":umausers" or whatever the command is which tells you who allocated what and is still holding on to it. (Base on stack trace). This will show all memory which wasn''t freed sorted by who allocated it. Casper
David McDaniel
2006-Oct-25 20:21 UTC
[dtrace-discuss] Re: Re: How to emit associative array after ^C
Thanks Casper. Yes, we know about ::umausers. The problem here is that the lifecyle of buffers is not a simple one where they are allocated, used, and freed nearby. Rather, they are allocated and then handed off to a rather complicated protocol state machine. When all goes properly, as it usually does, they are subsequently freed by another thread when the state machine runs to completion, which is normally within tens of seconds. Out of millions of allocations, through some event sequence we havent isolated, a handfull get forgotten about and never freed. We know precisely where they are allocated. Its these sporadic rogues that are vexing us. The effect on the end user is that this process has to be restarted every month or two, which is not a good thing in an erstwhile 5-9''s system. Regards -d This message posted from opensolaris.org