thr3ads.net - dtrace discuss - [dtrace-discuss] Memory leak on solaris 10 production server [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Durney, Mark

2008-Jun-30 13:03 UTC

[dtrace-discuss] Memory leak on solaris 10 production server

I have a production solaris 10 server that was recently moved to our
New DMX. Since it was moved to the new DMX, we started seeing the memory
Stepping. It will climb in useage to 87.99% and then drop to 41% useage?
The server has Oracle running on it. Because it''s a production server,
I
Obviously need to be careful with what tools I use on this server. Can
dtrace
be used and are the example scripts I can use to troubleshoot this
potential memory
Leak? I am new to dtrace.

Thanks

Brian Utterback

2008-Jun-30 13:20 UTC

head link

[dtrace-discuss] Memory leak on solaris 10 production server

Hi Mark.

Dtrace is the perfect tool for you. It is designed to have a minimal 
impact on a running system. As long as you don''t go crazy and use a 
probe like "fbt:::" that instruments every function in the kernel, you
should be pretty safe. Even with that one, it just feels like the 
system is overly loaded for awhile. I use "fbt:::" all the time in 
test environments, but I shy away from it in production servers.

What kind of memory usage do you see climb? How do you measure it? 
What kind of memory will point you in the direction you need to go.

Durney, Mark wrote:> I have a production solaris 10 server that was recently moved to our
> New DMX. Since it was moved to the new DMX, we started seeing the memory
> Stepping. It will climb in useage to 87.99% and then drop to 41% useage?
> The server has Oracle running on it. Because it''s a production
server, I
> Obviously need to be careful with what tools I use on this server. Can
> dtrace
> be used and are the example scripts I can use to troubleshoot this
> potential memory
> Leak? I am new to dtrace.
> 
> Thanks
> 
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
-- 
blu

There are two rules in life:
Rule 1- Don''t tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom

Sanjeev Bagewadi

2008-Jun-30 13:34 UTC

head link

[dtrace-discuss] Memory leak on solaris 10 production server

Durney,

I have a simple script for the userland and the details are available on 
my blog :
http://blogs.sun.com/sanjeevb/

The script is fairly rudimentary and I have intentionally avoided any 
processing during collection.
All the intelligence is in the postprocessing Perl script.

Probably there is room for optimization but, need to explore them.

Hope they are of use.

Thanks and regards,
Sanjeev.

Durney, Mark wrote:> I have a production solaris 10 server that was recently moved to our
> New DMX. Since it was moved to the new DMX, we started seeing the memory
> Stepping. It will climb in useage to 87.99% and then drop to 41% useage?
> The server has Oracle running on it. Because it''s a production
server, I
> Obviously need to be careful with what tools I use on this server. Can
> dtrace
> be used and are the example scripts I can use to troubleshoot this
> potential memory
> Leak? I am new to dtrace.
>
> Thanks
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>   

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

S h i v

2008-Jul-28 01:29 UTC

head link

[dtrace-discuss] Memory leak on solaris 10 production server

Hi Sanjeev,

I have attached a script that does some more processing on the dtrace
output to provide more information in the summary.

On Mon, Jun 30, 2008 at 7:04 PM, Sanjeev Bagewadi
<Sanjeev.Bagewadi at sun.com> wrote:>
> I have a simple script for the userland and the details are available on
> my blog :
> http://blogs.sun.com/sanjeevb/
>
> The script is fairly rudimentary and I have intentionally avoided any
> processing during collection.
> All the intelligence is in the postprocessing Perl script.
>
> Probably there is room for optimization but, need to explore them.
>
> Hope they are of use.
>
In the dtrace output using the dtrace script I see instances of malloc
for a pointer(at a location) happening more than once without a free.
See example below. What to make of this? How can malloc return a
pointer location that was already returned by an earlier call to
malloc?
(this is for an actual application, I donot have a sample program that
can reproduce this scenario)

malloc:return tid=1 ptr=<ptr-value> size=48
          <ustack_1>
....few lines without free.....
malloc:return tid=1 ptr=<*same* ptr-value> size=86      <- same memory
location
          <different_ustack>
....few lines.....
free:entry ptr=<ptr-value>

I had used your dtrace script and had found it to be quite useful.
Since the logs that used to be collected were often huge, the level of
detail perl script output provided wasn''t sufficient.
The attached script (python) does little more work.
Ofcourse some more optimizations and still better reporting is possible.
For ex: I see some stacks in our app leak sporadically under heavy
load due to call drops and alternate execution paths that get
triggered. Such stacks aren''t *always leaky*. This information can be
captured in the output.

======================================================
ERROR	line: <lineno> : Freeing non-existant memory at <ptr-value>
LEAK <nnn> bytes leaked. Allocated at line <line1> ptr
<ptr-value> is
re-alloced at line <line2>
......multiple lines as above.......
======================================================INFO	 Leaks along with
stack information is as follows:
======================================================Stack position (line no.):
<line1> <line2> <line3>...
Pointers:	<ptr1> <ptr2> <ptr3> ...
Size leaked:	<bytes1> <bytes2> <bytes3>
Total size leaked:	sumof(<bytes1> <bytes2> <bytes3> ...)
N(stack executions):	<no of times the leaky stack got executed>

STACK IS:
	<actual-stack>
......multiple stack info as above....

In the malloc related query, I was referring to the output line "LEAK
<nnn> bytes leaked. Allocated at line <line1> ptr <ptr-value>
is
re-alloced at line <line2>"
This output is when there are 2 mallocs at <line1> and <line2> in
the
dtrace output without a intermediate free.

-Shiv
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: memparse.py
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20080728/65620ffd/attachment.ksh>

Sanjeev Bagewadi

2008-Jul-29 06:16 UTC

head link

[dtrace-discuss] Memory leak on solaris 10 production server

Shiv,

S h i v wrote:> Hi Sanjeev,
>
> I have attached a script that does some more processing on the dtrace
> output to provide more information in the summary.
>
>   Thanks !
> On Mon, Jun 30, 2008 at 7:04 PM, Sanjeev Bagewadi
> <Sanjeev.Bagewadi at sun.com> wrote:
>   
>> I have a simple script for the userland and the details are available
on
>> my blog :
>> http://blogs.sun.com/sanjeevb/
>>
>> The script is fairly rudimentary and I have intentionally avoided any
>> processing during collection.
>> All the intelligence is in the postprocessing Perl script.
>>
>> Probably there is room for optimization but, need to explore them.
>>
>> Hope they are of use.
>>
>>     
>
> In the dtrace output using the dtrace script I see instances of malloc
> for a pointer(at a location) happening more than once without a free.
> See example below. What to make of this? How can malloc return a
> pointer location that was already returned by an earlier call to
> malloc?
> (this is for an actual application, I donot have a sample program that
> can reproduce this scenario)
>
> malloc:return tid=1 ptr=<ptr-value> size=48
>           <ustack_1>
> ....few lines without free.....
> malloc:return tid=1 ptr=<*same* ptr-value> size=86      <- same
memory location
>           <different_ustack>
> ....few lines....
> free:entry ptr=<ptr-value>
>   Need to doublecheck.. probably this was a realloc().
Or did you notice a drop during that period ? The drop would explain the 
missing free...

Thanks again for the enhancements !!

Regards,
Sanjeev.
>
> I had used your dtrace script and had found it to be quite useful.
> Since the logs that used to be collected were often huge, the level of
> detail perl script output provided wasn''t sufficient.
> The attached script (python) does little more work.
> Ofcourse some more optimizations and still better reporting is possible.
> For ex: I see some stacks in our app leak sporadically under heavy
> load due to call drops and alternate execution paths that get
> triggered. Such stacks aren''t *always leaky*. This information can
be
> captured in the output.
>
> ======================================================>
> ERROR	line: <lineno> : Freeing non-existant memory at
<ptr-value>
> LEAK <nnn> bytes leaked. Allocated at line <line1> ptr
<ptr-value> is
> re-alloced at line <line2>
> ......multiple lines as above.......
> ======================================================> INFO	 Leaks
along with stack information is as follows:
> ======================================================> Stack position
(line no.):	<line1> <line2> <line3>...
> Pointers:	<ptr1> <ptr2> <ptr3> ...
> Size leaked:	<bytes1> <bytes2> <bytes3>
> Total size leaked:	sumof(<bytes1> <bytes2> <bytes3> ...)
> N(stack executions):	<no of times the leaky stack got executed>
>
> STACK IS:
> 	<actual-stack>
> ......multiple stack info as above....
>
>
>
> In the malloc related query, I was referring to the output line "LEAK
> <nnn> bytes leaked. Allocated at line <line1> ptr
<ptr-value> is
> re-alloced at line <line2>"
> This output is when there are 2 mallocs at <line1> and <line2>
in the
> dtrace output without a intermediate free.
>
> -Shiv
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

S h i v

2008-Jul-29 12:39 UTC

head link

[dtrace-discuss] Memory leak on solaris 10 production server

On Tue, Jul 29, 2008 at 11:46 AM, Sanjeev Bagewadi
<Sanjeev.Bagewadi at sun.com> wrote:>>
>> malloc:return tid=1 ptr=<ptr-value> size=48
>>          <ustack_1>
>> ....few lines without free.....
>> malloc:return tid=1 ptr=<*same* ptr-value> size=86      <-
same memory
>> location
>>          <different_ustack>
>> ....few lines....
>> free:entry ptr=<ptr-value>
>>
>
> Need to doublecheck.. probably this was a realloc().
> Or did you notice a drop during that period ? The drop would explain the
> missing free...
>
Not a case of realloc since as per your script, the reallocs also are
captured and it should appear as the oldptr for print done in the
realloc:return.
This isn''t occurring. Not sure about the drops since the log itself
was generated in a different location by a different person.

Realloc wasn''t handled in the script. That may be handled by a
sequential free and a malloc.

regards
Shiv

dtrace discuss - Jun 2008 - Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server

[dtrace-discuss] Memory leak on solaris 10 production server