thr3ads.net - dtrace discuss - [dtrace-discuss] How to trace process memory usage [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Peter Eriksson

2006-Dec-04 09:16 UTC

[dtrace-discuss] How to trace process memory usage

We''re trying to locate a problem on one of our web servers where
suddenly everything grinds to a virtual halt (well, not really) due to something
forcing a *lot* of paging activitity. We are suspecting that it might be some
process that suddenly allocates a lot of memory and accesses it quickly -
forcing the rest of the (big) processes out to swap. *Or* something filesystem
related (ZFS perhaps?).

One thing that makes it problematic to trace is that when things go slowly/halt 
we can''t login to the machine ("fork: resource temporarily
unavailable").

Using a dtrace scripts we''ve seen that during the periods when things
are really slow some processes are starting to paging (and have really long
paging response times). (script:
http://www.solarisinternals.com/si/dtrace/whospaging.d)

An added complication is that during the times when things fail dtrace also more
or less fails to run...

   # priocntl -e -c RT dtrace -s ./whospaging.d > paging-RT.log
    dtrace: processing aborted: Abort due to systemic unresponsiveness

It worked better with:

    # priocntl -e -c RT dtrace -w -s ./whospaging.d > paging-RT-2.log

but then it wouldn''t print anything at all when the interesting things
were happening...

(Machine: Sun Ultra 60, 2x360MHz CPUs, 1500MB RAM)

Any suggestions on what to check next?
 
 
This message posted from opensolaris.org

Roch - PAE

2006-Dec-04 13:47 UTC

head link

[dtrace-discuss] How to trace process memory usage

Before you dig down, there  is currently a ZFS best practise
to configure  as much disk based swap   as your expected ZFS
caches (and  if  that is unknown provision   swap for all of
memory).  That may change in the  future  but it''s still the
reality for now. It will at least then help in the diagnostic.

More BP here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide


-r

Peter Eriksson writes:
 > We''re trying to locate a problem on one of our web servers where
 > suddenly everything grinds to a virtual halt (well, not really) due to
 > something forcing a *lot* of paging activitity. We are suspecting that
 > it might be some process that suddenly allocates a lot of memory and
 > accesses it quickly - forcing the rest of the (big) processes out to
 > swap. *Or* something filesystem related (ZFS perhaps?). 
 > 
 > One thing that makes it problematic to trace is that when things go
slowly/halt
 > we can''t login to the machine ("fork: resource temporarily
 > unavailable"). 
 > 
 > Using a dtrace scripts we''ve seen that during the periods when
things
 > are really slow some processes are starting to paging (and have really
 > long paging response times). (script:
 > http://www.solarisinternals.com/si/dtrace/whospaging.d) 
 > 
 > An added complication is that during the times when things fail dtrace
 > also more or less fails to run... 
 > 
 >    # priocntl -e -c RT dtrace -s ./whospaging.d > paging-RT.log
 >     dtrace: processing aborted: Abort due to systemic unresponsiveness
 > 
 > It worked better with:
 > 
 >     # priocntl -e -c RT dtrace -w -s ./whospaging.d > paging-RT-2.log
 > 
 > but then it wouldn''t print anything at all when the interesting
things were happening...
 > 
 > (Machine: Sun Ultra 60, 2x360MHz CPUs, 1500MB RAM)
 > 
 > Any suggestions on what to check next?
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > dtrace-discuss mailing list
 > dtrace-discuss at opensolaris.org

Jim Mauro

2006-Dec-04 14:10 UTC

head link

[dtrace-discuss] How to trace process memory usage

You need to take a step back I think, and first identify the problem.
You do not yet know if the memory usage is a user-land process,
and the approach of determing which processes are having their
pages stolen may not help - such processes may be victims, not the
cause. You need to audit the memory consumers, and go from there.
They are:
- The kernel
- The file system cache
- Processes

I assume you''re certain about the paging activity, meaning you see free
memory drop and the page scanner getting busy. This is observable
with vmstat - monitor freemem and the "sr" column.

prstat(1) is your friend. "prstat -s rss" is a wonderfully simple and 
effective
way to track physical memory usage on a per-process basis. Sure, we know
all about shared pages, and the fact that the sum of all process''s RSS
sizes
will be something much, much larger than physical memory. But all we''re
looking for here are processes with increasing large RSS, and who the
large consumers are. Once you''re identified the process(es), use
"pmap -x"
to refine your understanding of its memory usage.

On a system of this size (1.5GB of RAM), use mdb''s "memstat"
dcmd -
mdb -k
[output from mdb starting]
::memstat

This will give you a memory usage profile.

In my experience, the symptoms you describe are frequently the result of
the file system cache consuming memory (which, in and of itself, is not
a bad thing), then a process comes along that needs a bigger chunk than
is available, and the kernel has to get busy managing the shortfall.
With UFS, you''ll see the page cache in memstat. With ZFS, you will
not, since ZFS uses a its own mechanism for caching data and metadata.
Unfortunately, there isn''t an easy way to track ZFS as a memory
consumer
(at least not that I''m aware of) - The mdb "kmastat" dcmd
will show
usage for
all the zio pools and zfs caches, but it takes a bit of parsing to sort 
it out.
I''m sure a dtrace script could help track ZFS memory consumption, but
I''d need to spend a bit of time working through something like that.

Anyway, before we jump to conclusions, let''s start with first
identifying
the consumer. If it turns out that kernel memory is growing, we can
chase that down with dtrace and mdb/kmastat. If it''s a process, pmap to
determine the segment(s), and dtrace to track allocations.

HTH,
/jim

Peter Eriksson wrote:> We''re trying to locate a problem on one of our web servers where
suddenly everything grinds to a virtual halt (well, not really) due to something
forcing a *lot* of paging activitity. We are suspecting that it might be some
process that suddenly allocates a lot of memory and accesses it quickly -
forcing the rest of the (big) processes out to swap. *Or* something filesystem
related (ZFS perhaps?).
>
> One thing that makes it problematic to trace is that when things go
slowly/halt
> we can''t login to the machine ("fork: resource temporarily
unavailable").
>
> Using a dtrace scripts we''ve seen that during the periods when
things are really slow some processes are starting to paging (and have really
long paging response times). (script:
http://www.solarisinternals.com/si/dtrace/whospaging.d)
>
> An added complication is that during the times when things fail dtrace also
more or less fails to run...
>
>    # priocntl -e -c RT dtrace -s ./whospaging.d > paging-RT.log
>     dtrace: processing aborted: Abort due to systemic unresponsiveness
>
> It worked better with:
>
>     # priocntl -e -c RT dtrace -w -s ./whospaging.d > paging-RT-2.log
>
> but then it wouldn''t print anything at all when the interesting
things were happening...
>
> (Machine: Sun Ultra 60, 2x360MHz CPUs, 1500MB RAM)
>
> Any suggestions on what to check next?
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Matty

2006-Dec-04 15:38 UTC

head link

[dtrace-discuss] How to trace process memory usage

> With UFS, you''ll see the page cache in memstat. With ZFS, you will
> not, since ZFS uses a its own mechanism for caching data and metadata.
> Unfortunately, there isn''t an easy way to track ZFS as a memory
consumer
> (at least not that I''m aware of) - The mdb "kmastat"
dcmd will show
> usage for all the zio pools and zfs caches, but it takes a bit of parsing
to sort
> it out.
Is there an RFE open to add a ZFS ARC cache entry to the mdb memstat
dcmd? I looked through the bug archive, but wasn''t able to locate
anything to this effect.

Thanks,
- Ryan
-- 
UNIX Administrator
http://prefetch.net

Peter Eriksson

2006-Dec-04 22:22 UTC

head link

[dtrace-discuss] Re: How to trace process memory usage

> prstat(1) is your friend. "prstat -s rss" is a wonderfully simple
and effective
There is only one small problem with that approach - when things are running
normally we don''t see any strange behaviour. And when things misbehave
we typically can''t start any new processes (fork: resource temporarily
unavailable)...
Ie, no prstat/vmstat/pmap/top/ps..

That''s why we was thinking of using an already running Dtrace to try to
"see" what''s going on when things are misbehaving...


Anyway, I''ve now increased the available swapspace so it''s
more than the size of the RAM in the machine and we''ll see what
happens...
 
 
This message posted from opensolaris.org

Jim Mauro

2006-Dec-05 00:04 UTC

head link

[dtrace-discuss] Re: How to trace process memory usage

Got it - I didn''t realize the window of time between "things are
getting
slow" and
"we''re wedged" was so small. I would not start with a DTrace
collection
in the back ground.

You could run prstat in collect mode ("prstat -s rss -c 10 > 
/var/tmp/prstat.out" - the
10 is an interval of 10 seconds) in the background. If it''s a user land
process growing,
you''ll hopefully capture something before things wedge.

I would also recommend running "kstat -n system_pages 10 > 
/var/tmp/kstat.out"
in the background. You can track the free page list size, and kernel 
pages. Again,
hopefully there will be a trend there we can drill down on before things 
get wedged.

Between those 2, we should be able to determine if it''s a user land 
process consuming
memory, or something in the kernel. The next step will be based on what 
that tells
us.

HTH
/jim

Peter Eriksson wrote:>> prstat(1) is your friend. "prstat -s rss" is a wonderfully
simple and effective
>>     
>
> There is only one small problem with that approach - when things are
running normally we don''t see any strange behaviour. And when things
misbehave we typically can''t start any new processes (fork: resource
temporarily unavailable)...
> Ie, no prstat/vmstat/pmap/top/ps..
>
> That''s why we was thinking of using an already running Dtrace to
try to "see" what''s going on when things are misbehaving...
>
>
> Anyway, I''ve now increased the available swapspace so
it''s more than the size of the RAM in the machine and we''ll
see what happens...
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Surya.Prakki at Sun.COM

2006-Dec-05 03:37 UTC

head link

[dtrace-discuss] Re: How to trace process memory usage

One of things you may need to look out for is :
In case you are creating way too many lwps on the system, you may exhaust
segkpsize configured on your system. ::kmastat from an already running
''mdb -k'' session will help you figure out this (check for any
failures in
segkp caches).
-surya

Peter Eriksson wrote On 12/05/06 03:52,:
>>prstat(1) is your friend. "prstat -s rss" is a wonderfully
simple and effective
>>    
>>
>
>There is only one small problem with that approach - when things are running
normally we don''t see any strange behaviour. And when things misbehave
we typically can''t start any new processes (fork: resource temporarily
unavailable)...
>Ie, no prstat/vmstat/pmap/top/ps..
>
>That''s why we was thinking of using an already running Dtrace to
try to "see" what''s going on when things are misbehaving...
>
>
>Anyway, I''ve now increased the available swapspace so it''s
more than the size of the RAM in the machine and we''ll see what
happens...
> 
> 
>This message posted from opensolaris.org
>_______________________________________________
>dtrace-discuss mailing list
>dtrace-discuss at opensolaris.org
>  
>

dtrace discuss - Dec 2006 - How to trace process memory usage

[dtrace-discuss] How to trace process memory usage

[dtrace-discuss] How to trace process memory usage

[dtrace-discuss] How to trace process memory usage

[dtrace-discuss] How to trace process memory usage

[dtrace-discuss] Re: How to trace process memory usage

[dtrace-discuss] Re: How to trace process memory usage

[dtrace-discuss] Re: How to trace process memory usage