thr3ads.net - zfs discuss - [zfs-discuss] zfs/io performance on Netra X1 [Nov 2009]

If this information is useful, please help other people find it:
Share via:

inouk

2009-Nov-13 14:26 UTC

[zfs-discuss] zfs/io performance on Netra X1

Hi,

I have a Netra X1 server with 512MB ram and two ATA disk, model ST340016A. 
Processor is a UltraSPARC-IIe 500MHz.

Version of solaris is: Solaris 10 10/09 s10s_u8wos_08a SPARC

I jumpstarted the server with ZFS as root, two disks as a mirror:


=======================================================pool: rpool
state: ONLINE
scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool           ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c0t0d0s0  ONLINE       0     0     0
            c0t2d0s0  ONLINE       0     0     0

errors: No known data errors
=======================================================

Here''s the following partition table:

=======================================================NAME                    
USED  AVAIL  REFER  MOUNTPOINT
rpool                       28.9G  7.80G    98K  /rpool
rpool/ROOT             4.06G  7.80G    21K  legacy
rpool/ROOT/newbe  4.06G  7.80G  4.06G  /
rpool/dump              512M  7.80G   512M  -
rpool/opt                 23.8G  7.80G  4.11G  /opt
rpool/opt/export      19.7G  7.80G  19.7G  /opt/export
rpool/swap               512M  8.12G   187M  -
=======================================================
When I do this command:
# digest -a md5 /opt/export/BIGFILE (4.6GB)

It took around 1 hour and 45 minutes to process it.  I can understand that Netra
X1 with ATA can be slow but not like it.  Also, there are some inconsistencies
between "zfs iostat 1" and dtrace outputs.

Here''s the small snippet of zpool iostat 1:
=======================================================               capacity  
operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       28.5G  8.70G     33      1  3.28M  5.78K
rpool       28.5G  8.70G    124      0  15.5M      0
rpool       28.5G  8.70G    150      0  18.8M      0
rpool       28.5G  8.70G    134      0  16.8M      0
rpool       28.5G  8.70G    135      0  16.7M      0
=======================================================
No so bad as I would say, for ATA IDE drives.  But, if I dig deeper, for
example, if I use iostat -D 1 command:

=======================================================                 extended
device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     63.4    0.0 8115.1    0.0 32.7  2.0  547.5 100 100
dad1     63.4    0.0 8115.1    0.0 33.0  2.0  551.9 100 100
                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     52.0    0.0 6152.3    0.0  9.3  1.6  210.9  75  84
dad1     70.0    0.0 8714.7    0.0 16.0  2.0  256.5  93  99
                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
dad0     75.3    0.0 9634.7    0.0 23.1  1.9  332.7  96  97
dad1     71.3    0.0 9121.0    0.0 22.3  1.9  339.4  91  95

=======================================================
zfs says 15MB worth of data being transferred but iostat says 8MB worth of data.

We can also clearly see that the bus and hard drive are busy (%w and %b). 
Let''s see the % of IO with iotop from DTracet toolkit:

=======================================================2009 Nov 13 09:06:03, 
load: 0.37,  disk_r:  93697 KB,  disk_w:      0 KB

  UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
    0   2250   1859 digest           dad1    136   8     R      3
    0   2250   1859 digest           dad0    136   0     R      4
    0      0      0     sched            dad0    136   0     R     88
    0      0      0     sched            dad1    136   8     R     89
=======================================================
sched is taking up to 90% of IO.  I tried to trace it:

=======================================================[x]: ps -efl |grep sche
 1 T     root     0     0   0   0 SY        ?      0          21:31:44 ?        
0:48 sched
[x]: truss -f -p 1
1:      pollsys(0x0002B9DC, 1, 0xFFBFF588, 0x00000000) (sleeping...)
[x]:
=======================================================
Only one ouput from truss.  Weird. As if sched is doing nothing but is taking up
to 90% of IO.

prstat is showing this (I''m gonna show the only first process at the
top):

======================================================   PID USERNAME  SIZE  
RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  2250 root     2464K 1872K run      0    0   0:01:57  15% digest/1
======================================================
digest command is taking only 15% of cpu.  We can see that the IO is no so fast.

pfilestat from Dtracetoolkit is reporting this:
======================================================     STATE   FDNUM     
Time Filename
   running       0        6%
   waitcpu       0        7%
      read       4       13% /opt/export/BIGFILE
   sleep-r       0       69%

     STATE   FDNUM      KB/s Filename
      read       4       614 /opt/export/BIGFILE
======================================================
Around 615KB is really read each second.  That''s is very slow !  This
is not normal I think, even if those are ATA drives.

So my question are the following:

1.- Why zpool iostat is reporting 15MB/s of data read when in reality only
615KB/s is read ?
2.- Why sched is taking so much io? 
3.- What I can do to improve IO performance?  It find it very unbelievable that
this is the best performance the current hardware can provide...

Thank you!
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Nov-13 15:05 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

On Fri, 13 Nov 2009, inouk wrote:>
> So my question are the following:
>
> 1.- Why zpool iostat is reporting 15MB/s of data read when in reality only
615KB/s is read ?
> 2.- Why sched is taking so much io?
> 3.- What I can do to improve IO performance?  It find it very unbelievable
that this is the best performance the current hardware can provide...
Your system has every little RAM (512MB).  It is less than is 
recommended for Solaris 10 or for zfs and if it was a PC, it would be 
barely enough to run Windows XP.  Since zfs likes to use RAM and 
expects and sufficient RAM will be available, it seems likely that 
this system is both paging badly, and is also not succeeding to cache 
enough data to operate efficiently.  Zfs is re-reading from disk where 
normally the data would be cached.

The simple solution is to install a lot more RAM.  2GB is a good 
starting point.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jeffry Molanus

2009-Nov-13 15:33 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

Agreed, but still: wy zpool iostat 15MB en iostat 615KB?

Regard, Jeff
________________________________________
From: zfs-discuss-bounces at opensolaris.org [zfs-discuss-bounces at
opensolaris.org] On Behalf Of Bob Friesenhahn [bfriesen at simple.dallas.tx.us]
Sent: Friday, November 13, 2009 4:05 PM
To: inouk
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] zfs/io performance on Netra X1

On Fri, 13 Nov 2009, inouk wrote:>
> So my question are the following:
>
> 1.- Why zpool iostat is reporting 15MB/s of data read when in reality only
615KB/s is read ?
> 2.- Why sched is taking so much io?
> 3.- What I can do to improve IO performance?  It find it very unbelievable
that this is the best performance the current hardware can provide...
Your system has every little RAM (512MB).  It is less than is
recommended for Solaris 10 or for zfs and if it was a PC, it would be
barely enough to run Windows XP.  Since zfs likes to use RAM and
expects and sufficient RAM will be available, it seems likely that
this system is both paging badly, and is also not succeeding to cache
enough data to operate efficiently.  Zfs is re-reading from disk where
normally the data would be cached.

The simple solution is to install a lot more RAM.  2GB is a good
starting point.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

inouk

2009-Nov-13 15:40 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

> On Fri, 13 Nov 2009, inouk wrote:
> Your system has every little RAM (512MB).  It is less
> than is 
> recommended for Solaris 10 or for zfs and if it was a
> PC, it would be 
> barely enough to run Windows XP.  Since zfs likes to
> use RAM and 
> expects and sufficient RAM will be available, it
> seems likely that 
> this system is both paging badly, and is also not
> succeeding to cache 
> enough data to operate efficiently.  Zfs is
> re-reading from disk where 
> normally the data would be cached.
> 
> The simple solution is to install a lot more RAM.
>  2GB is a good 
> tarting point.
> 
I don''t agree, especially if you compare with Windows XP.  It has
windowing system and any other fancy stuffs.  The server I''m talking
about has nothing on it except system background processes (sendmail, kernel
threads, and all).  Finally, swap isn''t used at all.  So, I could say
almost 90% of ram is available for zfs operations.

Anyway, I discovered something interesting: while investigating, I
"offlined" the second disk in mirror pool:

===========================================================pool: rpool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using ''zpool online'' or replace the
device with
        ''zpool replace''.
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            c0t0d0s0  ONLINE       0     0     0
            c0t2d0s0  OFFLINE      0     0     0

errors: No known data errors
===========================================================

It went from 650KB to 1200KB (1.2MB) according to pfilestat:
===========================================================     STATE   FDNUM   
Time Filename
   running       0        5%
   waitcpu       0       12%
      read       0       16% /opt/export/flash_recovery/OVO_2008-02-20.fl
   sleep-r       0       65%

     STATE   FDNUM      KB/s Filename
      read       0      1200 /opt/export/flash_recovery/OVO_2008-02-20.fl

Total event time (ms): 4999   Total Mbytes/sec: 1
===========================================================
Also, in read transferts, sevice time reduced to between 80ms and 100ms:

===========================================================device    r/s    w/s 
kr/s   kw/s wait actv  svc_t  %w  %b
dad0      0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0  
dad1    168.8    0.0 21608.9    0.0 13.5  1.7   89.9  78  88 
===========================================================
Sounds like a bus bottleneck, as if two HD''s can''t use the
same bus for data transfert.  I don''t know the hardware specifications
of Netra X1, though...
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Nov-13 15:53 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

On Fri, 13 Nov 2009, inouk wrote:>
> Sounds like a bus bottleneck, as if two HD''s can''t use
the same bus
> for data transfert.  I don''t know the hardware specifications of 
> Netra X1, though
Maybe it uses Ultra-160 SCSI like my Sun Blade 2500?  This does 
constrain performance, but due to simultaneous writes (to each side if 
the mirror) rather than reads.

If it is using parallel SCSI, perhaps there is a problem with the SCSI 
bus termination or a bad cable?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim Cook

2009-Nov-13 15:57 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

On Fri, Nov 13, 2009 at 9:53 AM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Fri, 13 Nov 2009, inouk wrote:
>
>>
>> Sounds like a bus bottleneck, as if two HD''s can''t
use the same bus for
>> data transfert.  I don''t know the hardware specifications of
Netra X1,
>> though
>>
>
> Maybe it uses Ultra-160 SCSI like my Sun Blade 2500?  This does constrain
> performance, but due to simultaneous writes (to each side if the mirror)
> rather than reads.
>
> If it is using parallel SCSI, perhaps there is a problem with the SCSI bus
> termination or a bad cable?
>
>
> Bob
>

SCSI?  Try PATA ;)

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091113/0920e74d/attachment.html>

Bob Friesenhahn

2009-Nov-13 16:08 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

On Fri, 13 Nov 2009, Tim Cook wrote:> 
> If it is using parallel SCSI, perhaps there is a problem with the SCSI bus
termination or a bad cable?
> 
> SCSI?? Try PATA ;)
Is that good?  I don''t recall ever selecting that option when 
purchasing a computer.  It seemed safer to stick with SCSI than to try 
exotic technologies.

Does PATA daisy-chain disks onto the same cable and controller?

If this PATA and drives are becoming overwelmed, maybe it will help to 
tune zfs:zfs_vdev_max_pending down to a very small value in the 
kernel.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Brian H. Nelson

2009-Nov-13 16:28 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

Bob Friesenhahn wrote:> On Fri, 13 Nov 2009, Tim Cook wrote:
>>
>> If it is using parallel SCSI, perhaps there is a problem with the 
>> SCSI bus termination or a bad cable?
>>
>> SCSI?  Try PATA ;)
>
> Is that good?  I don''t recall ever selecting that option when 
> purchasing a computer.  It seemed safer to stick with SCSI than to try 
> exotic technologies.
>
I hope you''re being facetious. :-)   
http://en.wikipedia.org/wiki/Parallel_ATA

The Netra X1 has two IDE channels, so it should be able to handle 2 
disks without contention so long as only one disk is on each channel. 
OTOH, that machine is basically a desktop machine in a rack mount case 
(similar to a Blade 100) and is also vintage 2001. I wouldn''t expect 
much performance out of it regardless.

-Brian

Richard Elling

2009-Nov-13 16:29 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

The Netra X1 has one ATA bus for both internal drives.
No way to get high perf out of a snail.

   -- richard



On Nov 13, 2009, at 8:08 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us
 > wrote:
> On Fri, 13 Nov 2009, Tim Cook wrote:
>> If it is using parallel SCSI, perhaps there is a problem with the  
>> SCSI bus termination or a bad cable?
>> SCSI?  Try PATA ;)
>
> Is that good?  I don''t recall ever selecting that option when  
> purchasing a computer.  It seemed safer to stick with SCSI than to  
> try exotic technologies.
>
> Does PATA daisy-chain disks onto the same cable and controller?
>
> If this PATA and drives are becoming overwelmed, maybe it will help  
> to tune zfs:zfs_vdev_max_pending down to a very small value in the  
> kernel.
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Dale Ghent

2009-Nov-15 02:58 UTC

head link

[zfs-discuss] zfs/io performance on Netra X1

There is also a long-standing bug in the ALi chipset used on these servers which
ZFS tickles. I don''t think a work-around for this bug was ever
implemented, and it''s still present in Solaris 10.

On Nov 13, 2009, at 11:29 AM, Richard Elling wrote:
> The Netra X1 has one ATA bus for both internal drives.
> No way to get high perf out of a snail.
> 
>  -- richard
> 
> 
> 
> On Nov 13, 2009, at 8:08 AM, Bob Friesenhahn <bfriesen at
simple.dallas.tx.us> wrote:
> 
>> On Fri, 13 Nov 2009, Tim Cook wrote:
>>> If it is using parallel SCSI, perhaps there is a problem with the
SCSI bus termination or a bad cable?
>>> SCSI?  Try PATA ;)
>> 
>> Is that good?  I don''t recall ever selecting that option when
purchasing a computer.  It seemed safer to stick with SCSI than to try exotic
technologies.
>> 
>> Does PATA daisy-chain disks onto the same cable and controller?
>> 
>> If this PATA and drives are becoming overwelmed, maybe it will help to
tune zfs:zfs_vdev_max_pending down to a very small value in the kernel.
>> 
>> Bob
>> --
>> Bob Friesenhahn
>> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Nov 2009 - zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1

[zfs-discuss] zfs/io performance on Netra X1