Durney, Mark
2008-Jun-30 13:03 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
I have a production solaris 10 server that was recently moved to our New DMX. Since it was moved to the new DMX, we started seeing the memory Stepping. It will climb in useage to 87.99% and then drop to 41% useage? The server has Oracle running on it. Because it''s a production server, I Obviously need to be careful with what tools I use on this server. Can dtrace be used and are the example scripts I can use to troubleshoot this potential memory Leak? I am new to dtrace. Thanks
Brian Utterback
2008-Jun-30 13:20 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
Hi Mark. Dtrace is the perfect tool for you. It is designed to have a minimal impact on a running system. As long as you don''t go crazy and use a probe like "fbt:::" that instruments every function in the kernel, you should be pretty safe. Even with that one, it just feels like the system is overly loaded for awhile. I use "fbt:::" all the time in test environments, but I shy away from it in production servers. What kind of memory usage do you see climb? How do you measure it? What kind of memory will point you in the direction you need to go. Durney, Mark wrote:> I have a production solaris 10 server that was recently moved to our > New DMX. Since it was moved to the new DMX, we started seeing the memory > Stepping. It will climb in useage to 87.99% and then drop to 41% useage? > The server has Oracle running on it. Because it''s a production server, I > Obviously need to be careful with what tools I use on this server. Can > dtrace > be used and are the example scripts I can use to troubleshoot this > potential memory > Leak? I am new to dtrace. > > Thanks > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- blu There are two rules in life: Rule 1- Don''t tell people everything you know ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
Sanjeev Bagewadi
2008-Jun-30 13:34 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
Durney, I have a simple script for the userland and the details are available on my blog : http://blogs.sun.com/sanjeevb/ The script is fairly rudimentary and I have intentionally avoided any processing during collection. All the intelligence is in the postprocessing Perl script. Probably there is room for optimization but, need to explore them. Hope they are of use. Thanks and regards, Sanjeev. Durney, Mark wrote:> I have a production solaris 10 server that was recently moved to our > New DMX. Since it was moved to the new DMX, we started seeing the memory > Stepping. It will climb in useage to 87.99% and then drop to 41% useage? > The server has Oracle running on it. Because it''s a production server, I > Obviously need to be careful with what tools I use on this server. Can > dtrace > be used and are the example scripts I can use to troubleshoot this > potential memory > Leak? I am new to dtrace. > > Thanks > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >-- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521
S h i v
2008-Jul-28 01:29 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
Hi Sanjeev, I have attached a script that does some more processing on the dtrace output to provide more information in the summary. On Mon, Jun 30, 2008 at 7:04 PM, Sanjeev Bagewadi <Sanjeev.Bagewadi at sun.com> wrote:> > I have a simple script for the userland and the details are available on > my blog : > http://blogs.sun.com/sanjeevb/ > > The script is fairly rudimentary and I have intentionally avoided any > processing during collection. > All the intelligence is in the postprocessing Perl script. > > Probably there is room for optimization but, need to explore them. > > Hope they are of use. >In the dtrace output using the dtrace script I see instances of malloc for a pointer(at a location) happening more than once without a free. See example below. What to make of this? How can malloc return a pointer location that was already returned by an earlier call to malloc? (this is for an actual application, I donot have a sample program that can reproduce this scenario) malloc:return tid=1 ptr=<ptr-value> size=48 <ustack_1> ....few lines without free..... malloc:return tid=1 ptr=<*same* ptr-value> size=86 <- same memory location <different_ustack> ....few lines..... free:entry ptr=<ptr-value> I had used your dtrace script and had found it to be quite useful. Since the logs that used to be collected were often huge, the level of detail perl script output provided wasn''t sufficient. The attached script (python) does little more work. Ofcourse some more optimizations and still better reporting is possible. For ex: I see some stacks in our app leak sporadically under heavy load due to call drops and alternate execution paths that get triggered. Such stacks aren''t *always leaky*. This information can be captured in the output. ====================================================== ERROR line: <lineno> : Freeing non-existant memory at <ptr-value> LEAK <nnn> bytes leaked. Allocated at line <line1> ptr <ptr-value> is re-alloced at line <line2> ......multiple lines as above....... ======================================================INFO Leaks along with stack information is as follows: ======================================================Stack position (line no.): <line1> <line2> <line3>... Pointers: <ptr1> <ptr2> <ptr3> ... Size leaked: <bytes1> <bytes2> <bytes3> Total size leaked: sumof(<bytes1> <bytes2> <bytes3> ...) N(stack executions): <no of times the leaky stack got executed> STACK IS: <actual-stack> ......multiple stack info as above.... In the malloc related query, I was referring to the output line "LEAK <nnn> bytes leaked. Allocated at line <line1> ptr <ptr-value> is re-alloced at line <line2>" This output is when there are 2 mallocs at <line1> and <line2> in the dtrace output without a intermediate free. -Shiv -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: memparse.py URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080728/65620ffd/attachment.ksh>
Sanjeev Bagewadi
2008-Jul-29 06:16 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
Shiv, S h i v wrote:> Hi Sanjeev, > > I have attached a script that does some more processing on the dtrace > output to provide more information in the summary. > >Thanks !> On Mon, Jun 30, 2008 at 7:04 PM, Sanjeev Bagewadi > <Sanjeev.Bagewadi at sun.com> wrote: > >> I have a simple script for the userland and the details are available on >> my blog : >> http://blogs.sun.com/sanjeevb/ >> >> The script is fairly rudimentary and I have intentionally avoided any >> processing during collection. >> All the intelligence is in the postprocessing Perl script. >> >> Probably there is room for optimization but, need to explore them. >> >> Hope they are of use. >> >> > > In the dtrace output using the dtrace script I see instances of malloc > for a pointer(at a location) happening more than once without a free. > See example below. What to make of this? How can malloc return a > pointer location that was already returned by an earlier call to > malloc? > (this is for an actual application, I donot have a sample program that > can reproduce this scenario) > > malloc:return tid=1 ptr=<ptr-value> size=48 > <ustack_1> > ....few lines without free..... > malloc:return tid=1 ptr=<*same* ptr-value> size=86 <- same memory location > <different_ustack> > ....few lines.... > free:entry ptr=<ptr-value> >Need to doublecheck.. probably this was a realloc(). Or did you notice a drop during that period ? The drop would explain the missing free... Thanks again for the enhancements !! Regards, Sanjeev.> > I had used your dtrace script and had found it to be quite useful. > Since the logs that used to be collected were often huge, the level of > detail perl script output provided wasn''t sufficient. > The attached script (python) does little more work. > Ofcourse some more optimizations and still better reporting is possible. > For ex: I see some stacks in our app leak sporadically under heavy > load due to call drops and alternate execution paths that get > triggered. Such stacks aren''t *always leaky*. This information can be > captured in the output. > > ======================================================> > ERROR line: <lineno> : Freeing non-existant memory at <ptr-value> > LEAK <nnn> bytes leaked. Allocated at line <line1> ptr <ptr-value> is > re-alloced at line <line2> > ......multiple lines as above....... > ======================================================> INFO Leaks along with stack information is as follows: > ======================================================> Stack position (line no.): <line1> <line2> <line3>... > Pointers: <ptr1> <ptr2> <ptr3> ... > Size leaked: <bytes1> <bytes2> <bytes3> > Total size leaked: sumof(<bytes1> <bytes2> <bytes3> ...) > N(stack executions): <no of times the leaky stack got executed> > > STACK IS: > <actual-stack> > ......multiple stack info as above.... > > > > In the malloc related query, I was referring to the output line "LEAK > <nnn> bytes leaked. Allocated at line <line1> ptr <ptr-value> is > re-alloced at line <line2>" > This output is when there are 2 mallocs at <line1> and <line2> in the > dtrace output without a intermediate free. > > -Shiv > > ------------------------------------------------------------------------ > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521
S h i v
2008-Jul-29 12:39 UTC
[dtrace-discuss] Memory leak on solaris 10 production server
On Tue, Jul 29, 2008 at 11:46 AM, Sanjeev Bagewadi <Sanjeev.Bagewadi at sun.com> wrote:>> >> malloc:return tid=1 ptr=<ptr-value> size=48 >> <ustack_1> >> ....few lines without free..... >> malloc:return tid=1 ptr=<*same* ptr-value> size=86 <- same memory >> location >> <different_ustack> >> ....few lines.... >> free:entry ptr=<ptr-value> >> > > Need to doublecheck.. probably this was a realloc(). > Or did you notice a drop during that period ? The drop would explain the > missing free... >Not a case of realloc since as per your script, the reallocs also are captured and it should appear as the oldptr for print done in the realloc:return. This isn''t occurring. Not sure about the drops since the log itself was generated in a different location by a different person. Realloc wasn''t handled in the script. That may be handled by a sequential free and a malloc. regards Shiv