thr3ads.net - zfs discuss - [zfs-discuss] Which DTrace provider to use [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Jonathan Loran

2008-Feb-13 06:21 UTC

[zfs-discuss] Which DTrace provider to use

Hi List,

I''m wondering if one of you expert DTrace guru''s can help me. 
I want to
write a DTrace script to print out a a histogram of how long IO requests 
sit in the service queue.  I can output the results with the quantize 
method.  I''m not sure which provider I should be using for this.  Does 
anyone know?  I can easily adapt one of the DTrace Toolkit routines for 
this, if I can find the provider.

I''ll also throw out the problem I''m trying to meter.  We are
using ZFS
on a large SAN array (4TB).  The pool on this array serves up a lot of 
users, (250 home file systems/directories) and also /usr/local and other 
OTS software.  It works fine most of the time, but then gets overloaded 
during busy periods.  I''m going to reconfigure the array to help with 
this, but I sure would love to have some metrics to know how big a 
difference my tweaks are making.  Basically, the problem users 
experience, when the load shoots up are huge latencies.  An ls on a 
non-cached directory, which usually is instantaneous, will take 20, 30, 
40 seconds or more.   Then when the storage array catches up, things get 
better.  My clients are not happy campers. 

I know, I know, I should have gone with a JBOD setup, but it''s too late
for that in this iteration of this server.  We we set this up, I had the 
gear already, and it''s not in my budget to get new stuff right now.

Thanks for any help anyone can offer.

Jon

Marion Hakanson

2008-Feb-13 22:36 UTC

head link

[zfs-discuss] Which DTrace provider to use

jloran at ssl.berkeley.edu said:> difference my tweaks are making.  Basically, the problem users  experience,
> when the load shoots up are huge latencies.  An ls on a  non-cached
> directory, which usually is instantaneous, will take 20, 30,  40 seconds or
> more.   Then when the storage array catches up, things get  better.  My
> clients are not happy campers. 
> 
> I know, I know, I should have gone with a JBOD setup, but it''s too
late  for
> that in this iteration of this server.  We we set this up, I had the  gear
> already, and it''s not in my budget to get new stuff right now. 
What kind of array are you seeing this problem with?  It sounds very much
like our experience here with a 3-yr-old HDS ATA array.  When the crunch
came here, I didn''t know enough dtrace to help, but I threw the
following
into crontab to run every five minutes (24x7), and it at least collected
the info I needed to see what LUN/filesystem was busying things out.

Way crude, but effective enough:

  /bin/ksh -c "date && mpstat 2 20 && iostat -xn 2 20 \
    && fsstat $(zfs list -H -o mountpoint -t filesystem | egrep
''^/'') 2 20 \
    && vmstat 2 20" >> /var/tmp/iostats.log 2>&1
</dev/null

A quick scan using "egrep" could pull out trouble spots;  E.g. the
following
would identify "iostat" lines that showed 90-100% busy:

  egrep ''^Sun |^Mon |^Tue |^Wed |^Thu |^Fri |^Sat | 1[0-9][0-9] c6| 
9[0-9]
c6''\
    /var/tmp/iostats.log

I know it''s not the Dtrace you were asking for, but maybe
it''ll inspire
something more useful than the above shotgun approach.

Regards,

Marion

Rob Logan

2008-Feb-13 23:10 UTC

head link

[zfs-discuss] Which DTrace provider to use

> Way crude, but effective enough:
kinda cool, but isn''t thats what
sar -f /var/adm/sa/sa`date +%d` -A | grep -v ","
is for?  crontab -e sys
to start..

for more fun
acctadm -e extended -f /var/adm/exacct/proc process

		Rob

Jonathan Loran

2008-Feb-13 23:19 UTC

head link

[zfs-discuss] Which DTrace provider to use

Marion Hakanson wrote:> jloran at ssl.berkeley.edu said:
>   
>> ...
>>
>> I know, I know, I should have gone with a JBOD setup, but it''s
too late  for
>> that in this iteration of this server.  We we set this up, I had the 
gear
>> already, and it''s not in my budget to get new stuff right now.
>>     
>
> What kind of array are you seeing this problem with?  It sounds very much
> like our experience here with a 3-yr-old HDS ATA array.  
It''s not that old.  It''s a Supermicro system with a 3ware
9650SE-8LP.
Open-E iSCSI-R3 DOM module.  The system is plenty fast.  I can pretty 
handily pull 120MB/sec from it, and write at over 100MB/sec.  It falls 
apart more on random I/O.  The server/initiator side is a T2000 with 
Solaris 10u4.  It never sees over 25% CPU, ever.  Oh yeah, and two 1GB 
network links to the SAN
> When the crunch
> came here, I didn''t know enough dtrace to help, but I threw the
following
> into crontab to run every five minutes (24x7), and it at least collected
> the info I needed to see what LUN/filesystem was busying things out.
>
> Way crude, but effective enough:
>
>   /bin/ksh -c "date && mpstat 2 20 && iostat -xn 2 20
\
>     && fsstat $(zfs list -H -o mountpoint -t filesystem | egrep
''^/'') 2 20 \
>     && vmstat 2 20" >> /var/tmp/iostats.log 2>&1
</dev/null
>
> A quick scan using "egrep" could pull out trouble spots;  E.g.
the following
> would identify "iostat" lines that showed 90-100% busy:
>
>   egrep ''^Sun |^Mon |^Tue |^Wed |^Thu |^Fri |^Sat | 1[0-9][0-9]
c6|  9[0-9]
> c6''\
>     /var/tmp/iostats.log
>   
yeah, I have some running traditional *stat utilities running.  If I 
capture more than a second at a time, things look good.  I was hoping to 
get a real distribution of service times, to catch the outliers, that 
don''t get absorbed into the average.  Hence why I wanted to use dtrace.

My opinion is, if when the array got really loaded up, everything slowed 
down evenly, users wouldn''t mind or notice much.  But when every 20 or 
so reads/writes gets delayed my 10s of seconds, the users start to line 
up at my door.

Thanks for the tips.

Jon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080213/e9af8db0/attachment.html>

Marion Hakanson

2008-Feb-14 01:22 UTC

head link

[zfs-discuss] Which DTrace provider to use

jloran at ssl.berkeley.edu said:> It''s not that old.  It''s a Supermicro system with a 3ware
9650SE-8LP.
> Open-E iSCSI-R3 DOM module.  The system is plenty fast.  I can pretty
> handily pull 120MB/sec from it, and write at over 100MB/sec.  It falls 
apart
> more on random I/O.  The server/initiator side is a T2000 with  Solaris
10u4.
>  It never sees over 25% CPU, ever.  Oh yeah, and two 1GB  network links to
> the SAN 
> . . .
> My opinion is, if when the array got really loaded up, everything slowed
> down evenly, users wouldn''t mind or notice much.  But when every
20 or  so
> reads/writes gets delayed my 10s of seconds, the users start to line  up at
> my door. 
Hmm, I have no experience with iSCSI yet.  But behavior of our T2000
file/NFS server connected via 2Gbit fiber channel SAN is exactly as
you describe when our HDS SATA array gets behind.  Access to other
ZFS pools remains unaffected, but any access to the busy pool just
hangs.  Some Oracle apps on NFS clients die due to excessive delays.

In our case, this old HDS array''s SATA shelves have a very limited
queue
depth (four per RAID controller) in the "back end" loop, plus every
write
is hit with the added overhead of an in-array read-back verification.
Maybe your iSCSI situation injects enough latency at higher loads to
cause something like our FC queue limitations.

Good luck,

Marion

Jonathan Loran

2008-Feb-14 19:47 UTC

head link

[zfs-discuss] Which DTrace provider to use

Marion Hakanson wrote:> jloran at ssl.berkeley.edu said:
>   
>> It''s not that old.  It''s a Supermicro system with a
3ware 9650SE-8LP.
>> Open-E iSCSI-R3 DOM module.  The system is plenty fast.  I can pretty
>> handily pull 120MB/sec from it, and write at over 100MB/sec.  It falls 
apart
>> more on random I/O.  The server/initiator side is a T2000 with  Solaris
10u4.
>>  It never sees over 25% CPU, ever.  Oh yeah, and two 1GB  network links
to
>> the SAN 
>> . . .
>> My opinion is, if when the array got really loaded up, everything
slowed
>> down evenly, users wouldn''t mind or notice much.  But when
every 20 or  so
>> reads/writes gets delayed my 10s of seconds, the users start to line 
up at
>> my door. 
>>     
>
> Hmm, I have no experience with iSCSI yet.  But behavior of our T2000
> file/NFS server connected via 2Gbit fiber channel SAN is exactly as
> you describe when our HDS SATA array gets behind.  Access to other
> ZFS pools remains unaffected, but any access to the busy pool just
> hangs.  Some Oracle apps on NFS clients die due to excessive delays.
>
> In our case, this old HDS array''s SATA shelves have a very limited
queue
> depth (four per RAID controller) in the "back end" loop, plus
every write
> is hit with the added overhead of an in-array read-back verification.
> Maybe your iSCSI situation injects enough latency at higher loads to
> cause something like our FC queue limitations.
>
>   The iSCSI array has 2GB RAM as a cache.  Writes to cache complete very 
fast. I''m not sure, but would love to get some metering going on this 
guy to find out, that it''s really the reads that cause the issue.  It 
seems like, but I''m not totally sure yet that heavy random read loads 
are when things break down.  I''ll pass on anything I find to the list, 
''cause I''m sure there are a lot of folks with ZFS on a SAN. 
The
flexibility of having the SAN is still seductive, even though the 
benefits to ZFS performance for direct attached storage are pulling us 
the other way. 

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080214/d77f5929/attachment.html>

Brendan Gregg - Sun Microsystems

2008-Feb-14 21:48 UTC

head link

[zfs-discuss] Which DTrace provider to use

G''Day Jon,

For disk layer metrics, you could try Disk/iopending from the DTraceToolkit
to check how saturated the disks become with requests (which answers that
question with much higher definition than iostat).  I''d also run
disktime.d,
which should be in the next DTraceToolkit release (it''s pretty
obvious),
and is included below.  disktime.d measures disk delta times - time from
request to completion.

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option dynvarsize=16m

BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

io:::start
{
        start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
}

io:::done
/start[args[0]->b_edev, args[0]->b_blkno]/
{
        this->delta = timestamp - start[args[0]->b_edev,
args[0]->b_blkno];
        @[args[1]->dev_statname] = quantize(this->delta);
}

The iopattern script will also give you a measure of random vs sequential
I/O - which would be interesting to see.

...

For latencies in ZFS (such as ZIO pipeline latencies), we don''t have a
stable
provider yet.  It is possible to write fbt based scripts to do this - but
they''ll only work on a particular version of Solaris.

fsinfo would be a good provider to hit up for the VFS layer.

I''d also check syscall latencies - it might be too obvious, but it can
be
worth checking (eg, if you discover those long latencies are only on the
open syscall)...

Brendan


-- 
Brendan
[CA, USA]

Jonathan Loran

2008-Feb-14 22:54 UTC

head link

[zfs-discuss] Which DTrace provider to use

Hi Brendon,

I have been using iopending, though I''m not sure how to interpret it.  
Is it true the column on the left is how deep in the queue requests are, 
and then the histogram represents home many requests there are at each 
queue depth?  Then I would guess if there''s lots of requests with high 
queue depth, that''s bad.  once in awhile, I see some pretty long
queues,
but they only last a second, and then things even right out again.

I''ll try your disktime.d script below and the other checks you 
recommend.  May have more questions to follow.  Thanks!

Jon

Brendan Gregg - Sun Microsystems wrote:> G''Day Jon,
>
> For disk layer metrics, you could try Disk/iopending from the DTraceToolkit
> to check how saturated the disks become with requests (which answers that
> question with much higher definition than iostat).  I''d also run
disktime.d,
> which should be in the next DTraceToolkit release (it''s pretty
obvious),
> and is included below.  disktime.d measures disk delta times - time from
> request to completion.
>
> #!/usr/sbin/dtrace -s
>
> #pragma D option quiet
> #pragma D option dynvarsize=16m
>
> BEGIN
> {
>         trace("Tracing... Hit Ctrl-C to end.\n");
> }
>
> io:::start
> {
>         start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
> }
>
> io:::done
> /start[args[0]->b_edev, args[0]->b_blkno]/
> {
>         this->delta = timestamp - start[args[0]->b_edev,
args[0]->b_blkno];
>         @[args[1]->dev_statname] = quantize(this->delta);
> }
>
> The iopattern script will also give you a measure of random vs sequential
> I/O - which would be interesting to see.
>
> ...
>
> For latencies in ZFS (such as ZIO pipeline latencies), we don''t
have a stable
> provider yet.  It is possible to write fbt based scripts to do this - but
> they''ll only work on a particular version of Solaris.
>
> fsinfo would be a good provider to hit up for the VFS layer.
>
> I''d also check syscall latencies - it might be too obvious, but it
can be
> worth checking (eg, if you discover those long latencies are only on the
> open syscall)...
>
> Brendan
>
>
>   
-- 


-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 jloran at ssl.berkeley.edu
- ______/    ______/    ______/           AST:7731^29u18e3

Brendan Gregg - Sun Microsystems

2008-Feb-14 23:57 UTC

head link

[zfs-discuss] Which DTrace provider to use

G''Day Jon,
                                                                                
On Thu, Feb 14, 2008 at 02:54:58PM -0800, Jonathan Loran wrote:                
>
> Hi Brendon,
>
> I have been using iopending, though I''m not sure how to interpret
it.                                                                                
Did you read Examples/iopending_example.txt?  That''s why it''s
there. :)
                                                                               
> Is it true the column on the left is how deep in the queue requests are,
> and then the histogram represents home many requests there are at each
> queue depth?                                                                                
Yes, the counts are samples.                                                    
                                                                               
> Then I would guess if there''s lots of requests with high
> queue depth, that''s bad.                                                                                
If it remains that way, yes.                                                    
                                                                               
>  once in awhile, I see some pretty long queues,
> but they only last a second, and then things even right out again.                                                                                
That sounds fine - operations like zil_sync() queue a bunch of I/O''s
briefly, which is usually fine.                                                 
                                                                                
Brendan                                                                         
                                                                                
--                                                                              
Brendan                                                                         
[CA, USA]

przemolicc at poczta.fm

2008-Feb-15 07:28 UTC

head link

[zfs-discuss] Which DTrace provider to use

On Tue, Feb 12, 2008 at 10:21:44PM -0800, Jonathan Loran
wrote:> 
> Hi List,
> 
> I''m wondering if one of you expert DTrace guru''s can help
me.  I want to
> write a DTrace script to print out a a histogram of how long IO requests 
> sit in the service queue.  I can output the results with the quantize 
> method.  I''m not sure which provider I should be using for this. 
Does
> anyone know?  I can easily adapt one of the DTrace Toolkit routines for 
> this, if I can find the provider.
> 
> I''ll also throw out the problem I''m trying to meter.  We
are using ZFS
> on a large SAN array (4TB).  The pool on this array serves up a lot of 
> users, (250 home file systems/directories) and also /usr/local and other 
> OTS software.  It works fine most of the time, but then gets overloaded 
> during busy periods.  I''m going to reconfigure the array to help
with
> this, but I sure would love to have some metrics to know how big a 
> difference my tweaks are making.  Basically, the problem users 
> experience, when the load shoots up are huge latencies.  An ls on a 
> non-cached directory, which usually is instantaneous, will take 20, 30, 
> 40 seconds or more.   Then when the storage array catches up, things get 
> better.  My clients are not happy campers. 
> 
> I know, I know, I should have gone with a JBOD setup, but it''s too
late
> for that in this iteration of this server.  We we set this up, I had the 
> gear already, and it''s not in my budget to get new stuff right
now.
> 
> Thanks for any help anyone can offer.
I have faced similar problem (although not exactly the same) and was going to
monitor disk queue with dtrace but couldn''t find any docs/urls about
it.
Finally asked Chris Gerhard for help. He partially answered via his
blog: http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io

Maybe it helps you.

Regards
przemol

--
http://przemol.blogspot.com/





















----------------------------------------------------------------------
Masz ostatnia szanse !
Sprawdz >>> http://link.interia.pl/f1d02

Jonathan Loran

2008-Feb-15 07:44 UTC

head link

[zfs-discuss] Which DTrace provider to use

przemolicc at poczta.fm wrote:> On Tue, Feb 12, 2008 at 10:21:44PM -0800, Jonathan Loran wrote:
>   
>> Thanks for any help anyone can offer.
>>     
>
> I have faced similar problem (although not exactly the same) and was going
to
> monitor disk queue with dtrace but couldn''t find any docs/urls
about it.
> Finally asked Chris Gerhard for help. He partially answered via his
> blog: http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
>
> Maybe it helps you.
>
> Regards
> przemol
>
>   This is perfect.  Thank you.

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080214/19611bfe/attachment.html>

Roch Bourbonnais

2008-Feb-15 10:20 UTC

head link

[zfs-discuss] Which DTrace provider to use

Le 14 f?vr. 08 ? 02:22, Marion Hakanson a ?crit :
> jloran at ssl.berkeley.edu said:
>> It''s not that old.  It''s a Supermicro system with a
3ware 9650SE-8LP.
>> Open-E iSCSI-R3 DOM module.  The system is plenty fast.  I can pretty
>> handily pull 120MB/sec from it, and write at over 100MB/sec.  It  
>> falls  apart
>> more on random I/O.  The server/initiator side is a T2000 with   
>> Solaris 10u4.
>> It never sees over 25% CPU, ever.  Oh yeah, and two 1GB  network  
>> links to
>> the SAN
>> . . .
>> My opinion is, if when the array got really loaded up, everything  
>> slowed
>> down evenly, users wouldn''t mind or notice much.  But when
every 20
>> or  so
>> reads/writes gets delayed my 10s of seconds, the users start to  
>> line  up at
>> my door.
>
>
This is the write throttling problem. I''ve tested code that changes  
radically the
situation for the better. We just need to go through performance  
validation before putback.

	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205


-r

>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Feb 2008 - Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use

[zfs-discuss] Which DTrace provider to use