thr3ads.net - zfs discuss - [zfs-discuss] JBOD performance [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Frank Penczek

2007-Dec-14 18:33 UTC

[zfs-discuss] JBOD performance

Hi all,

we are using the following setup as file server:

---
# uname -a
SunOS troubadix 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire-280R

# prtconf -D
System Configuration:  Sun Microsystems  sun4u
Memory size: 2048 Megabytes
System Peripherals (Software Nodes):

SUNW,Sun-Fire-280R (driver name: rootnex)
    scsi_vhci, instance #0 (driver name: scsi_vhci)
    packages
        SUNW,builtin-drivers
        deblocker
        disk-label
        terminal-emulator
        obp-tftp
        SUNW,debug
        dropins
        kbd-translator
        ufs-file-system
    chosen
    openprom
        client-services
    options, instance #0 (driver name: options)
    aliases
    memory
    virtual-memory
    SUNW,UltraSPARC-III+
    memory-controller, instance #0 (driver name: mc-us3)
    SUNW,UltraSPARC-III+
    memory-controller, instance #1 (driver name: mc-us3)
    pci, instance #0 (driver name: pcisch)
        ebus, instance #0 (driver name: ebus)
            flashprom
            bbc
            power, instance #0 (driver name: power)
            i2c, instance #0 (driver name: pcf8584)
                dimm-fru, instance #0 (driver name: seeprom)
                dimm-fru, instance #1 (driver name: seeprom)
                dimm-fru, instance #2 (driver name: seeprom)
                dimm-fru, instance #3 (driver name: seeprom)
                nvram, instance #4 (driver name: seeprom)
                idprom
            i2c, instance #1 (driver name: pcf8584)
                cpu-fru, instance #5 (driver name: seeprom)
                temperature, instance #0 (driver name: max1617)
                cpu-fru, instance #6 (driver name: seeprom)
                temperature, instance #1 (driver name: max1617)
                fan-control, instance #0 (driver name: tda8444)
                motherboard-fru, instance #7 (driver name: seeprom)
                ioexp, instance #0 (driver name: pcf8574)
                ioexp, instance #1 (driver name: pcf8574)
                ioexp, instance #2 (driver name: pcf8574)
                fcal-backplane, instance #8 (driver name: seeprom)
                remote-system-console, instance #9 (driver name: seeprom)
                power-distribution-board, instance #10 (driver name: seeprom)
                power-supply, instance #11 (driver name: seeprom)
                power-supply, instance #12 (driver name: seeprom)
                rscrtc
            beep, instance #0 (driver name: bbc_beep)
            rtc, instance #0 (driver name: todds1287)
            gpio, instance #0 (driver name: gpio_87317)
            pmc, instance #0 (driver name: pmc)
            parallel, instance #0 (driver name: ecpp)
            rsc-control, instance #0 (driver name: su)
            rsc-console, instance #1 (driver name: su)
            serial, instance #0 (driver name: se)
        network, instance #0 (driver name: eri)
        usb, instance #0 (driver name: ohci)
        scsi, instance #0 (driver name: glm)
            disk (driver name: sd)
            tape (driver name: st)
            sd, instance #12 (driver name: sd)
 ...
            ses, instance #29 (driver name: ses)
            ses, instance #30 (driver name: ses)
        scsi, instance #1 (driver name: glm)
            disk (driver name: sd)
            tape (driver name: st)
            sd, instance #31 (driver name: sd)
            sd, instance #32 (driver name: sd)
...
            ses, instance #46 (driver name: ses)
            ses, instance #47 (driver name: ses)
        network, instance #0 (driver name: ce)
    pci, instance #1 (driver name: pcisch)
        SUNW,qlc, instance #0 (driver name: qlc)
            fp (driver name: fp)
                disk (driver name: ssd)
            fp, instance #1 (driver name: fp)
                ssd, instance #1 (driver name: ssd)
                ssd, instance #0 (driver name: ssd)
        scsi, instance #0 (driver name: mpt)
            disk (driver name: sd)
            tape (driver name: st)
            sd, instance #0 (driver name: sd)
            sd, instance #1 (driver name: sd)
...
            ses, instance #14 (driver name: ses)
            ses, instance #31 (driver name: ses)
    os-io
    iscsi, instance #0 (driver name: iscsi)
    pseudo, instance #0 (driver name: pseudo)
---

The disks reside in a StoreEdge3320 expansion unit
connected to the machine''s SCSI controller card (LSI1030 U320).
We''ve created a raidz2 pool:

---
# zpool status
  pool: storage_array
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed Dec 12 23:38:36 2007
config:

        NAME         STATE     READ WRITE CKSUM
        storage_array  ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            c2t8d0   ONLINE       0     0     0
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0

errors: No known data errors
---

The throughput when writing from a local disk to the
zpool is around 30MB/s, when writing from a client
machine, the throughput drops to ~9MB/s (NFS mounts
over dedicated gigabit switch).
When copying data to the pool throughput drops every few
seconds to almost zero regardless of the source (NFS or local).

# zpool iostat 1
                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
...
storage_array   138G   202G      0      0      0      0
storage_array   138G   202G      0     11      0   123K
storage_array   138G   202G      2     30  3.96K  3.01M
storage_array   138G   202G      0     96      0  4.14M
storage_array   138G   202G      0    136      0  4.36M
storage_array   138G   202G      0     73      0  4.09M
storage_array   138G   202G      2     77   254K  9.19M
storage_array   138G   202G      0     64   127K  6.05M
storage_array   138G   202G      0     75      0  8.70M
storage_array   138G   202G      0    101      0  3.98M
storage_array   138G   202G      5    154  2.97K  6.19M
storage_array   138G   202G      0     74      0  8.06M
storage_array   138G   202G      0    121      0  2.77M
storage_array   138G   202G      0     64      0  4.95M
storage_array   138G   202G      0     63      0  7.73M
storage_array   138G   202G      0     75      0  9.41M
storage_array   138G   202G      1    128   235K  4.00M
storage_array   138G   202G      0     97      0  4.16M
storage_array   138G   202G      0     72      0  9.08M
storage_array   138G   202G      0     70      0  8.68M
storage_array   138G   202G      0     70      0  8.79M
storage_array   138G   202G      2    102  13.4K  8.01M
storage_array   138G   202G      0    178      0   599K
storage_array   138G   202G      0     37      0  3.39M
storage_array   138G   202G      0     79      0  9.92M
storage_array   138G   202G      0     72      0  9.10M
storage_array   138G   202G      0     79      0  9.93M
storage_array   138G   202G      0     69      0  8.67M
storage_array   138G   202G      0     76      0  9.53M
storage_array   138G   202G      0    116      0  8.50M
storage_array   138G   202G      0    112      0  2.76M
storage_array   138G   202G      0      0      0      0
storage_array   138G   202G      0     55      0  6.95M
storage_array   138G   202G      0      0      0      0
storage_array   138G   202G      0     12      0  1.61M
storage_array   138G   202G      0     70      0  8.79M
storage_array   138G   202G      0     88      0  11.0M
storage_array   138G   202G      0     79      0  9.90M
...



The performance is slightly disappointing. Does anyone have
a similar setup and can anyone share some figures?
Any pointers to possible improvements are greatly appreciated.


Cheers,
  Frank

Louwtjie Burger

2007-Dec-14 20:07 UTC

head link

[zfs-discuss] JBOD performance

> The throughput when writing from a local disk to the
> zpool is around 30MB/s, when writing from a client
Err.. sorry, the internal storage would be good old 1Gbit FCAL disks @
10K rpm. Still, not the fastest around ;)

Richard Elling

2007-Dec-14 20:24 UTC

head link

[zfs-discuss] JBOD performance

Frank Penczek wrote:>
> The performance is slightly disappointing. Does anyone have
> a similar setup and can anyone share some figures?
> Any pointers to possible improvements are greatly appreciated.
>
>   
Use a faster processor or change to a mirrored configuration.
raidz2 can become processor bound in the Reed-Soloman calculations
for the 2nd parity set.  You should be able to see this in mpstat, and to
a coarser grain in vmstat.
 -- richard

Peter Schuller

2007-Dec-15 07:45 UTC

head link

[zfs-discuss] JBOD performance

> Use a faster processor or change to a mirrored configuration.
> raidz2 can become processor bound in the Reed-Soloman calculations
> for the 2nd parity set.  You should be able to see this in mpstat, and to
> a coarser grain in vmstat.
Hmm. Is the OP''s hardware *that* slow? (I don''t know enough
about the Sun
hardware models)

I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an X2 
3800+ (i.e., one of the earlier AMD dual-core offerings). Here''s me
dd:ing to
a file on FreeBSD on ZFS running on that hardware:

promraid     741G   387G      0    380      0  47.2M
promraid     741G   387G      0    336      0  41.8M
promraid     741G   387G      0    424    510  51.0M
promraid     741G   387G      0    441      0  54.5M
promraid     741G   387G      0    514      0  19.2M
promraid     741G   387G     34    192  4.12M  24.1M
promraid     741G   387G      0    341      0  42.7M
promraid     741G   387G      0    361      0  45.2M
promraid     741G   387G      0    350      0  43.9M
promraid     741G   387G      0    370      0  46.3M
promraid     741G   387G      1    423   134K  51.7M
promraid     742G   386G     22    329  2.39M  10.3M
promraid     742G   386G     28    214  3.49M  26.8M
promraid     742G   386G      0    347      0  43.5M
promraid     742G   386G      0    349      0  43.7M
promraid     742G   386G      0    354      0  44.3M
promraid     742G   386G      0    365      0  45.7M
promraid     742G   386G      2    460  7.49K  55.5M

At this point the bottleneck looks architectural rather than CPU. None of the 
cores are saturated, and the CPU usage of the ZFS kernel threads is pretty 
low.

I say architectural because writes to the underlying devices are not 
sustained; it drops to almost zero for certain periods (this is more visible 
in iostat -x than it is in the zpool statistics). What I think is happening 
is that ZFS is too late to evict data in the cache, thus blocking the writing 
process. Once a transaction group with a bunch of data gets committed the 
application unblocks, but presumably ZFS waits for a little while before 
resuming writes.

Note that this is also being run on plain hardware; it''s not even PCI
Express.
During throughput peaks, but not constantly, the bottleneck is probably the 
PCI bus.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part.
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071215/aa24bcff/attachment.bin>

Robert Milkowski

2007-Dec-15 13:56 UTC

head link

[zfs-discuss] JBOD performance

Hello Peter,

Saturday, December 15, 2007, 7:45:50 AM, you wrote:
>> Use a faster processor or change to a mirrored configuration.
>> raidz2 can become processor bound in the Reed-Soloman calculations
>> for the 2nd parity set.  You should be able to see this in mpstat, and
to
>> a coarser grain in vmstat.
PS> Hmm. Is the OP''s hardware *that* slow? (I don''t know
enough about the Sun
PS> hardware models)

PS> I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an
X2
PS> 3800+ (i.e., one of the earlier AMD dual-core offerings). Here''s
me dd:ing to
PS> a file on FreeBSD on ZFS running on that hardware:

PS> promraid     741G   387G      0    380      0  47.2M
PS> promraid     741G   387G      0    336      0  41.8M
PS> promraid     741G   387G      0    424    510  51.0M
PS> promraid     741G   387G      0    441      0  54.5M
PS> promraid     741G   387G      0    514      0  19.2M
PS> promraid     741G   387G     34    192  4.12M  24.1M
PS> promraid     741G   387G      0    341      0  42.7M
PS> promraid     741G   387G      0    361      0  45.2M
PS> promraid     741G   387G      0    350      0  43.9M
PS> promraid     741G   387G      0    370      0  46.3M
PS> promraid     741G   387G      1    423   134K  51.7M
PS> promraid     742G   386G     22    329  2.39M  10.3M
PS> promraid     742G   386G     28    214  3.49M  26.8M
PS> promraid     742G   386G      0    347      0  43.5M
PS> promraid     742G   386G      0    349      0  43.7M
PS> promraid     742G   386G      0    354      0  44.3M
PS> promraid     742G   386G      0    365      0  45.7M
PS> promraid     742G   386G      2    460  7.49K  55.5M

PS> At this point the bottleneck looks architectural rather than CPU. None of
the
PS> cores are saturated, and the CPU usage of the ZFS kernel threads is
pretty
PS> low.

PS> I say architectural because writes to the underlying devices are not 
PS> sustained; it drops to almost zero for certain periods (this is more
visible
PS> in iostat -x than it is in the zpool statistics). What I think is
happening
PS> is that ZFS is too late to evict data in the cache, thus blocking the
writing
PS> process. Once a transaction group with a bunch of data gets committed the
PS> application unblocks, but presumably ZFS waits for a little while before
PS> resuming writes.

PS> Note that this is also being run on plain hardware; it''s not
even PCI Express.
PS> During throughput peaks, but not constantly, the bottleneck is probably
the
PS> PCI bus.

Sequential writing problem with process throttling - there''s an open
bug for it for quite a while. Try to lower txg_time to 1s - should
help a little bit.

Can you also post iostat -xnz 1 while you''re doing dd?
and zpool status

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Frank Penczek

2007-Dec-16 14:25 UTC

head link

[zfs-discuss] JBOD performance

Hi,

On Dec 14, 2007 7:50 PM, Louwtjie Burger <burgerw at zaber.org.za> wrote:
[...]> I would have said ... to be expected, since the 280 came with a
> 100Mbit interface. So a 9-12 MB/s peak would be acceptable. You did
> mention a "gigabit switch"... did you install a gigabit HBA ? If
> that''s the case then yes, performance sucks.
Yes, sorry, I forgot to mention that we''re using a GigaSwift NIC in
that
machine.
> fsstat?
What ''fsstat'' output are you interested in? For a start 
here''s ''fsstat -F'':

# fsstat -F
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
 660K 69.8K 16.0K 11.3M  170K  60.2M  154K 25.6M 14.0G 6.05M 58.9G ufs
    0     0     0 43.6K     0  78.8K 14.0K 27.8K 10.0M     0     0 proc
    0     0     0    21     0      0     0     0     0     0     0 nfs
30.7K 7.67K 11.2K  348M 56.9K   122M 4.61M 1.65M 25.1G 1.05M 58.2G zfs
    0     0     0  574K     0      0     0     0     0     0     0 lofs
 162K 17.3K  120K  273K 15.9K  1.48M 4.60K  418K 1.24G 1.10M 5.93G tmpfs
    0     0     0 6.18K     0      0     0    51 9.31K     0     0 mntfs
    0     0     0     0     0      0     0     0     0     0     0 nfs3
    0     0     0     0     0      0     0     0     0     0     0 nfs4
    0     0     0    43     0      0     0     0     0     0     0 autofs


Thanks,
  Frank

Frank Penczek

2007-Dec-16 16:21 UTC

head link

[zfs-discuss] JBOD performance

Hi,

sorry for the lengthy post ...

On Dec 15, 2007 1:56 PM, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:
[...]> Sequential writing problem with process throttling - there''s an
open
> bug for it for quite a while. Try to lower txg_time to 1s - should
> help a little bit.
Since setting txg_time to 1 the periodic drop in bandwidth seems to have gone.
That''s great. Unfortunately the performance is still not amazing -
10MB/s over
the network and not more...
> Can you also post iostat -xnz 1 while you''re doing dd?
> and zpool status
---
# zpool status
  pool: storage_array
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed Dec 12 23:38:36 2007
config:

        NAME         STATE     READ WRITE CKSUM
        storage_array  ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            c2t8d0   ONLINE       0     0     0
            c2t9d0   ONLINE       0     0     0
            c2t10d0  ONLINE       0     0     0
            c2t11d0  ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0

errors: No known data errors


---
dd''ing to NFS mount:
fpz at obelix://tmp> dd if=./file.tmp of=/home/fpz/file.tmp
200000+0 records in
200000+0 records out
102400000 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s

# iostat -xnz 1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    2.8   17.3  149.4  127.6  0.0  1.3    0.0   66.0   0  12 c2t8d0
    2.8   17.3  149.4  127.6  0.0  1.3    0.0   65.9   0  13 c2t9d0
    2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.1   0  13 c2t10d0
    2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.4   0  13 c2t11d0
    2.8   17.3  149.5  127.6  0.0  1.3    0.0   66.5   0  13 c2t12d0
    0.3    1.0    5.4  133.9  0.0  0.0    0.1   27.2   0   1 c1t1d0
    0.5    0.3   26.8   16.5  0.0  0.0    0.1   11.1   0   0 c1t0d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.9   0   1 c1t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.5   0   0 c2t8d0
    0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.5   0   1 c2t9d0
    0.0   10.0    0.0    6.0  0.0  0.0    0.0    0.7   0   1 c2t10d0
    0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.3   0   0 c2t11d0
    0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.3   0   0 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   67.6    0.0 1298.6  0.0  9.8    0.2  145.2   1  71 c2t8d0
    0.0   64.8    0.0 1139.4  0.0  9.2    0.0  141.8   0  69 c2t9d0
    0.0   59.2    0.0  898.9  0.0  8.6    0.0  144.9   0  68 c2t10d0
    0.0   67.6    0.0 1379.4  0.0  9.5    0.0  140.0   0  68 c2t11d0
    0.0   70.4    0.0 1257.3  0.0 11.4    0.0  162.1   0  73 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   43.8    0.0 3068.5  0.0 34.9    0.0  796.0   0 100 c2t8d0
    0.0   55.6    0.0 3891.9  0.0 34.7    0.0  624.9   0 100 c2t9d0
    0.0   58.8    0.0 4211.9  0.0 33.4    0.0  568.2   0 100 c2t10d0
    0.0   49.2    0.0 3388.6  0.0 34.5    0.0  702.3   0 100 c2t11d0
    0.0   57.7    0.0 3805.3  0.0 34.3    0.0  594.0   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   60.0    0.0 4279.6  0.0 35.0    0.0  583.2   0 100 c2t8d0
    0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t9d0
    0.0   41.0    0.0 2910.3  0.0 35.0    0.0  853.6   0 100 c2t10d0
    0.0   50.0    0.0 3552.2  0.0 35.0    0.0  699.9   0 100 c2t11d0
    0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t8d0
    0.0   60.0    0.0 4280.8  0.0 35.0    0.0  583.1   0 100 c2t9d0
    0.0   55.0    0.0 3938.2  0.0 35.0    0.0  636.1   0 100 c2t10d0
    0.0   56.0    0.0 4024.3  0.0 35.0    0.0  624.7   0 100 c2t11d0
    0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   52.0    0.0 3723.5  0.0 35.0    0.0  672.9   0 100 c2t8d0
    0.0   43.0    0.0 3081.5  0.0 35.0    0.0  813.8   0 100 c2t9d0
    0.0   46.0    0.0 3296.0  0.0 35.0    0.0  760.7   0 100 c2t10d0
    0.0   48.0    0.0 3424.0  0.0 35.0    0.0  729.0   0 100 c2t11d0
    0.0   62.0    0.0 4408.1  0.0 35.0    0.0  564.4   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   60.0    0.0 4279.8  0.0 35.0    0.0  583.2   0 100 c2t8d0
    0.0   57.0    0.0 4065.8  0.0 35.0    0.0  613.9   0 100 c2t9d0
    0.0   59.0    0.0 4194.3  0.0 35.0    0.0  593.1   0 100 c2t10d0
    0.0   56.0    0.0 4023.3  0.0 35.0    0.0  624.9   0 100 c2t11d0
    0.0   48.0    0.0 3424.3  0.0 35.0    0.0  729.1   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   65.7    0.0 1385.0  0.0 14.5    0.0  220.8   0  90 c2t8d0
    0.9   68.4   39.8 1623.6  0.0 13.0    0.0  187.8   0  87 c2t9d0
    0.9   74.9   39.3 2054.6  0.0 16.7    0.0  219.6   0  94 c2t10d0
    0.9   70.3   39.3 1662.9  0.0 15.4    0.0  216.1   0  95 c2t11d0
    0.0   68.4    0.0 1736.0  0.0 14.9    0.0  217.9   0  87 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   65.3    0.0 3287.1  0.0 29.2    0.0  447.8   0  99 c2t8d0
    0.0   55.5    0.0 2642.4  0.0 28.2    0.0  508.9   0  99 c2t9d0
    0.0   47.9    0.0 2130.0  0.0 26.7    0.0  558.2   0 100 c2t10d0
    0.0   66.4    0.0 3336.1  0.0 29.3    0.0  441.2   0 100 c2t11d0
    0.0   65.3    0.0 3103.3  0.0 29.7    0.0  454.7   0  99 c2t12d0
    0.0    1.1    0.0    2.2  0.0  0.0    0.0   10.0   0   1 c1t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   44.0    0.0 3125.2  0.0 35.0    0.0  795.1   0 100 c2t8d0
    0.0   50.0    0.0 3553.8  0.0 35.0    0.0  699.7   0 100 c2t9d0
    0.0   55.0    0.0 3895.8  0.0 35.0    0.0  636.1   0 100 c2t10d0
    0.0   44.0    0.0 3081.7  0.0 35.0    0.0  795.1   0 100 c2t11d0
    0.0   48.0    0.0 3424.7  0.0 35.0    0.0  728.8   0 100 c2t12d0
    0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.7   0   1 c1t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   52.0    0.0 3724.6  0.0 35.0    0.0  672.8   0 100 c2t8d0
    0.0   46.0    0.0 3253.1  0.0 35.0    0.0  760.6   0 100 c2t9d0
    0.0   38.0    0.0 2697.0  0.0 35.0    0.0  920.7   0 100 c2t10d0
    0.0   51.0    0.0 3638.6  0.0 35.0    0.0  686.0   0 100 c2t11d0
    0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   44.0    0.0 2915.0  0.0 22.9    0.0  521.2   0 100 c2t8d0
    0.0   47.0    0.0 3382.1  0.0 24.1    0.0  512.0   0 100 c2t9d0
    0.0   56.0    0.0 4024.2  0.0 25.7    0.0  459.3   0 100 c2t10d0
    0.0   41.0    0.0 2954.1  0.0 22.7    0.0  552.4   0 100 c2t11d0
    0.0   46.0    0.0 3083.6  0.0 22.8    0.0  494.7   0  98 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  103.0    0.0  207.5  0.0  5.4    0.0   52.4   0  89 c2t8d0
    0.0  101.0    0.0  381.4  0.0  4.8    0.0   47.2   0  89 c2t9d0
    0.0  102.0    0.0  432.9  0.0  4.0    0.0   39.5   0  79 c2t10d0
    0.0  112.0    0.0  257.5  0.0  5.9    0.0   52.4   0  95 c2t11d0
    0.0  111.0    0.0  206.5  0.0  6.1    0.0   54.8   0  92 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  102.0    0.0  213.0  0.0  4.7    0.0   46.3   0  78 c2t8d0
    0.0  106.0    0.0  214.5  0.0  5.0    0.0   47.6   0  82 c2t9d0
    0.0   95.0    0.0  214.5  0.0  4.3    0.0   45.5   0  71 c2t10d0
    0.0   97.0    0.0  214.0  0.0  4.7    0.0   48.9   0  80 c2t11d0
    0.0   99.0    0.0  216.5  0.0  5.2    0.0   52.7   0  90 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   67.0    0.0  104.0  0.0  3.2    0.0   47.3   0  55 c2t8d0
    0.0   68.0    0.0  106.5  0.0  3.4    0.0   49.6   0  58 c2t9d0
    0.0   66.0    0.0  101.5  0.0  3.2    0.0   48.6   0  60 c2t10d0
    0.0   64.0    0.0  103.0  0.0  3.1    0.0   48.0   0  57 c2t11d0
    0.0   69.0    0.0  103.5  0.0  3.1    0.0   45.4   0  62 c2t12d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    1.0    0.0    1.5  0.0  0.0    0.0   10.2   0   1 c1t0d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    1.0    0.0    0.5  0.0  0.0    0.0   10.3   0   1 c1t1d0



Thanks for your time!

Cheers,
  Frank

Frank Penczek

2007-Dec-16 16:31 UTC

head link

[zfs-discuss] JBOD performance

Hi,

On Dec 14, 2007 8:24 PM, Richard Elling <Richard.Elling at sun.com>
wrote:> Frank Penczek wrote:
> >
> > The performance is slightly disappointing. Does anyone have
> > a similar setup and can anyone share some figures?
> > Any pointers to possible improvements are greatly appreciated.
> >
> >
>
> Use a faster processor or change to a mirrored configuration.
> raidz2 can become processor bound in the Reed-Soloman calculations
> for the 2nd parity set.  You should be able to see this in mpstat, and to
> a coarser grain in vmstat.
>  -- richard

Thanks for the hint. When dd''ing to the pool, mpstat tells me:

# mpstat 1
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   46   0   67   518  413  509    4   46   26    0   272    1   3   0  96
  1   44   0   64  1765  141  576    5   46   26    0   256    1   3   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   10   804  698 1130    4   69   52    0   218    0   5   0  95
  1    7   0   10  3301  390 1189    5   79   64    0   106    0   4   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    4   294  188  601    3   56   59    0    85    0   2   0  98
  1    0   0    4  1029  319  593    1   57   61    0    78    0   2   0  98
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    2   303  198  238    2   33   10    0   145    0   0   0 100
  1    0   0    4   283   74  261    3   30    8    0   159    0   1   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    2   0  113  1055  885  813  137  120   20    9 32833    7  57   0  36
  1   90   2   74  3622  220 1328   74  118   49   18 19956    5  45   0  50
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   14   0  115   438  191  876  181  127   40    0 32425    7  54   0  39
  1   14   0  197  1513  453  671  118  132   54    0 23929    5  53   0  42
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   98   901  679 1726  163  189   43    0 26718    5  48   0  47
  1    0   0  121  3722  508  843  171  194   32    0 29947    6  53   0  41
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   50   418  185  905  162  109   21    0 31884    7  54   0  39
  1    0   0  135  1772  550  670  102  107   37    0 23882    5  55   0  40
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   58   353  184  330  106  105   17    0 30134   28  47   0  26
  1    1   0   74   862  250  604  128  106   27    0 26312    6  45   0  50
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   60  1087  851 1542  231  174   31    1 33800    7  61   0  32
  1    0   0  136  4191  444 1072  125  165   39    1 20273    4  52   0  44
...

Based on the ''idl'' column I interpret these numbers as
"there are resources left" or is it me being naive?

Cheers,
  Frank

Louwtjie Burger

2007-Dec-16 17:30 UTC

head link

[zfs-discuss] JBOD performance

>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t8d0
>     0.0   60.0    0.0 4280.8  0.0 35.0    0.0  583.1   0 100 c2t9d0
>     0.0   55.0    0.0 3938.2  0.0 35.0    0.0  636.1   0 100 c2t10d0
>     0.0   56.0    0.0 4024.3  0.0 35.0    0.0  624.7   0 100 c2t11d0
>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
That service time is just terrible!

James C. McPherson

2007-Dec-16 21:54 UTC

head link

[zfs-discuss] JBOD performance

hi Frank,

there is an interesting pattern here (at least, to my
untrained eyes) - your %b starts off quite low:


Frank Penczek wrote:
....> ---
> dd''ing to NFS mount:
> fpz at obelix://tmp> dd if=./file.tmp of=/home/fpz/file.tmp
> 200000+0 records in
> 200000+0 records out
> 102400000 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s
> 
> # iostat -xnz 1
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     2.8   17.3  149.4  127.6  0.0  1.3    0.0   66.0   0  12 c2t8d0
>     2.8   17.3  149.4  127.6  0.0  1.3    0.0   65.9   0  13 c2t9d0
>     2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.1   0  13 c2t10d0
>     2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.4   0  13 c2t11d0
>     2.8   17.3  149.5  127.6  0.0  1.3    0.0   66.5   0  13 c2t12d0
>     0.3    1.0    5.4  133.9  0.0  0.0    0.1   27.2   0   1 c1t1d0
>     0.5    0.3   26.8   16.5  0.0  0.0    0.1   11.1   0   0 c1t0d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.9   0   1 c1t1d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.5   0   0 c2t8d0
>     0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.5   0   1 c2t9d0
>     0.0   10.0    0.0    6.0  0.0  0.0    0.0    0.7   0   1 c2t10d0
>     0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.3   0   0 c2t11d0
>     0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.3   0   0 c2t12d0

then it jumps - roughly, quadrupling
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   67.6    0.0 1298.6  0.0  9.8    0.2  145.2   1  71 c2t8d0
>     0.0   64.8    0.0 1139.4  0.0  9.2    0.0  141.8   0  69 c2t9d0
>     0.0   59.2    0.0  898.9  0.0  8.6    0.0  144.9   0  68 c2t10d0
>     0.0   67.6    0.0 1379.4  0.0  9.5    0.0  140.0   0  68 c2t11d0
>     0.0   70.4    0.0 1257.3  0.0 11.4    0.0  162.1   0  73 c2t12d0
then it maxes out and stays that way
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   43.8    0.0 3068.5  0.0 34.9    0.0  796.0   0 100 c2t8d0
>     0.0   55.6    0.0 3891.9  0.0 34.7    0.0  624.9   0 100 c2t9d0
>     0.0   58.8    0.0 4211.9  0.0 33.4    0.0  568.2   0 100 c2t10d0
>     0.0   49.2    0.0 3388.6  0.0 34.5    0.0  702.3   0 100 c2t11d0
>     0.0   57.7    0.0 3805.3  0.0 34.3    0.0  594.0   0 100 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   60.0    0.0 4279.6  0.0 35.0    0.0  583.2   0 100 c2t8d0
>     0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t9d0
>     0.0   41.0    0.0 2910.3  0.0 35.0    0.0  853.6   0 100 c2t10d0
>     0.0   50.0    0.0 3552.2  0.0 35.0    0.0  699.9   0 100 c2t11d0
>     0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t8d0
>     0.0   60.0    0.0 4280.8  0.0 35.0    0.0  583.1   0 100 c2t9d0
>     0.0   55.0    0.0 3938.2  0.0 35.0    0.0  636.1   0 100 c2t10d0
>     0.0   56.0    0.0 4024.3  0.0 35.0    0.0  624.7   0 100 c2t11d0
>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   52.0    0.0 3723.5  0.0 35.0    0.0  672.9   0 100 c2t8d0
>     0.0   43.0    0.0 3081.5  0.0 35.0    0.0  813.8   0 100 c2t9d0
>     0.0   46.0    0.0 3296.0  0.0 35.0    0.0  760.7   0 100 c2t10d0
>     0.0   48.0    0.0 3424.0  0.0 35.0    0.0  729.0   0 100 c2t11d0
>     0.0   62.0    0.0 4408.1  0.0 35.0    0.0  564.4   0 100 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   60.0    0.0 4279.8  0.0 35.0    0.0  583.2   0 100 c2t8d0
>     0.0   57.0    0.0 4065.8  0.0 35.0    0.0  613.9   0 100 c2t9d0
>     0.0   59.0    0.0 4194.3  0.0 35.0    0.0  593.1   0 100 c2t10d0
>     0.0   56.0    0.0 4023.3  0.0 35.0    0.0  624.9   0 100 c2t11d0
>     0.0   48.0    0.0 3424.3  0.0 35.0    0.0  729.1   0 100 c2t12d0

drops back a fraction
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   65.7    0.0 1385.0  0.0 14.5    0.0  220.8   0  90 c2t8d0
>     0.9   68.4   39.8 1623.6  0.0 13.0    0.0  187.8   0  87 c2t9d0
>     0.9   74.9   39.3 2054.6  0.0 16.7    0.0  219.6   0  94 c2t10d0
>     0.9   70.3   39.3 1662.9  0.0 15.4    0.0  216.1   0  95 c2t11d0
>     0.0   68.4    0.0 1736.0  0.0 14.9    0.0  217.9   0  87 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   65.3    0.0 3287.1  0.0 29.2    0.0  447.8   0  99 c2t8d0
>     0.0   55.5    0.0 2642.4  0.0 28.2    0.0  508.9   0  99 c2t9d0
>     0.0   47.9    0.0 2130.0  0.0 26.7    0.0  558.2   0 100 c2t10d0
>     0.0   66.4    0.0 3336.1  0.0 29.3    0.0  441.2   0 100 c2t11d0
>     0.0   65.3    0.0 3103.3  0.0 29.7    0.0  454.7   0  99 c2t12d0
>     0.0    1.1    0.0    2.2  0.0  0.0    0.0   10.0   0   1 c1t1d0
but quickly reverts to 100%:
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   44.0    0.0 3125.2  0.0 35.0    0.0  795.1   0 100 c2t8d0
>     0.0   50.0    0.0 3553.8  0.0 35.0    0.0  699.7   0 100 c2t9d0
>     0.0   55.0    0.0 3895.8  0.0 35.0    0.0  636.1   0 100 c2t10d0
>     0.0   44.0    0.0 3081.7  0.0 35.0    0.0  795.1   0 100 c2t11d0
>     0.0   48.0    0.0 3424.7  0.0 35.0    0.0  728.8   0 100 c2t12d0
>     0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.7   0   1 c1t1d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   52.0    0.0 3724.6  0.0 35.0    0.0  672.8   0 100 c2t8d0
>     0.0   46.0    0.0 3253.1  0.0 35.0    0.0  760.6   0 100 c2t9d0
>     0.0   38.0    0.0 2697.0  0.0 35.0    0.0  920.7   0 100 c2t10d0
>     0.0   51.0    0.0 3638.6  0.0 35.0    0.0  686.0   0 100 c2t11d0
>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   44.0    0.0 2915.0  0.0 22.9    0.0  521.2   0 100 c2t8d0
>     0.0   47.0    0.0 3382.1  0.0 24.1    0.0  512.0   0 100 c2t9d0
>     0.0   56.0    0.0 4024.2  0.0 25.7    0.0  459.3   0 100 c2t10d0
>     0.0   41.0    0.0 2954.1  0.0 22.7    0.0  552.4   0 100 c2t11d0
>     0.0   46.0    0.0 3083.6  0.0 22.8    0.0  494.7   0  98 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0  103.0    0.0  207.5  0.0  5.4    0.0   52.4   0  89 c2t8d0
>     0.0  101.0    0.0  381.4  0.0  4.8    0.0   47.2   0  89 c2t9d0
>     0.0  102.0    0.0  432.9  0.0  4.0    0.0   39.5   0  79 c2t10d0
>     0.0  112.0    0.0  257.5  0.0  5.9    0.0   52.4   0  95 c2t11d0
>     0.0  111.0    0.0  206.5  0.0  6.1    0.0   54.8   0  92 c2t12d0
and then tails off
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0  102.0    0.0  213.0  0.0  4.7    0.0   46.3   0  78 c2t8d0
>     0.0  106.0    0.0  214.5  0.0  5.0    0.0   47.6   0  82 c2t9d0
>     0.0   95.0    0.0  214.5  0.0  4.3    0.0   45.5   0  71 c2t10d0
>     0.0   97.0    0.0  214.0  0.0  4.7    0.0   48.9   0  80 c2t11d0
>     0.0   99.0    0.0  216.5  0.0  5.2    0.0   52.7   0  90 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0   67.0    0.0  104.0  0.0  3.2    0.0   47.3   0  55 c2t8d0
>     0.0   68.0    0.0  106.5  0.0  3.4    0.0   49.6   0  58 c2t9d0
>     0.0   66.0    0.0  101.5  0.0  3.2    0.0   48.6   0  60 c2t10d0
>     0.0   64.0    0.0  103.0  0.0  3.1    0.0   48.0   0  57 c2t11d0
>     0.0   69.0    0.0  103.5  0.0  3.1    0.0   45.4   0  62 c2t12d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0    1.0    0.0    1.5  0.0  0.0    0.0   10.2   0   1 c1t0d0
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0    1.0    0.0    0.5  0.0  0.0    0.0   10.3   0   1 c1t1d0


All of which, to me, look like you''re filling a buffer
or two.

I don''t recall the config of your zpool, but if the
devices are disks that are direct or san-attached, I
would be wondering about their outstanding queue depths.

I think it''s time to break out some D to find out where
in the stack the bottleneck(s) really are.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Robert Milkowski

2007-Dec-17 10:13 UTC

head link

[zfs-discuss] JBOD performance

Hello James,

Sunday, December 16, 2007, 9:54:18 PM, you wrote:

JCM> hi Frank,

JCM> there is an interesting pattern here (at least, to my
JCM> untrained eyes) - your %b starts off quite low:


JCM> Frank Penczek wrote:
JCM> ....>> ---
>> dd''ing to NFS mount:
>> fpz at obelix://tmp> dd if=./file.tmp of=/home/fpz/file.tmp
>> 200000+0 records in
>> 200000+0 records out
>> 102400000 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s
>> 
>> # iostat -xnz 1
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     2.8   17.3  149.4  127.6  0.0  1.3    0.0   66.0   0  12 c2t8d0
>>     2.8   17.3  149.4  127.6  0.0  1.3    0.0   65.9   0  13 c2t9d0
>>     2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.1   0  13 c2t10d0
>>     2.8   17.3  149.3  127.6  0.0  1.3    0.0   66.4   0  13 c2t11d0
>>     2.8   17.3  149.5  127.6  0.0  1.3    0.0   66.5   0  13 c2t12d0
>>     0.3    1.0    5.4  133.9  0.0  0.0    0.1   27.2   0   1 c1t1d0
>>     0.5    0.3   26.8   16.5  0.0  0.0    0.1   11.1   0   0 c1t0d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.9   0   1 c1t1d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.5   0   0 c2t8d0
>>     0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.5   0   1 c2t9d0
>>     0.0   10.0    0.0    6.0  0.0  0.0    0.0    0.7   0   1 c2t10d0
>>     0.0   10.0    0.0    7.0  0.0  0.0    0.0    0.3   0   0 c2t11d0
>>     0.0   10.0    0.0    7.5  0.0  0.0    0.0    0.3   0   0 c2t12d0

JCM> then it jumps - roughly, quadrupling
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   67.6    0.0 1298.6  0.0  9.8    0.2  145.2   1  71 c2t8d0
>>     0.0   64.8    0.0 1139.4  0.0  9.2    0.0  141.8   0  69 c2t9d0
>>     0.0   59.2    0.0  898.9  0.0  8.6    0.0  144.9   0  68 c2t10d0
>>     0.0   67.6    0.0 1379.4  0.0  9.5    0.0  140.0   0  68 c2t11d0
>>     0.0   70.4    0.0 1257.3  0.0 11.4    0.0  162.1   0  73 c2t12d0
JCM> then it maxes out and stays that way
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   43.8    0.0 3068.5  0.0 34.9    0.0  796.0   0 100 c2t8d0
>>     0.0   55.6    0.0 3891.9  0.0 34.7    0.0  624.9   0 100 c2t9d0
>>     0.0   58.8    0.0 4211.9  0.0 33.4    0.0  568.2   0 100 c2t10d0
>>     0.0   49.2    0.0 3388.6  0.0 34.5    0.0  702.3   0 100 c2t11d0
>>     0.0   57.7    0.0 3805.3  0.0 34.3    0.0  594.0   0 100 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   60.0    0.0 4279.6  0.0 35.0    0.0  583.2   0 100 c2t8d0
>>     0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t9d0
>>     0.0   41.0    0.0 2910.3  0.0 35.0    0.0  853.6   0 100 c2t10d0
>>     0.0   50.0    0.0 3552.2  0.0 35.0    0.0  699.9   0 100 c2t11d0
>>     0.0   48.0    0.0 3423.7  0.0 35.0    0.0  729.1   0 100 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t8d0
>>     0.0   60.0    0.0 4280.8  0.0 35.0    0.0  583.1   0 100 c2t9d0
>>     0.0   55.0    0.0 3938.2  0.0 35.0    0.0  636.1   0 100 c2t10d0
>>     0.0   56.0    0.0 4024.3  0.0 35.0    0.0  624.7   0 100 c2t11d0
>>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   52.0    0.0 3723.5  0.0 35.0    0.0  672.9   0 100 c2t8d0
>>     0.0   43.0    0.0 3081.5  0.0 35.0    0.0  813.8   0 100 c2t9d0
>>     0.0   46.0    0.0 3296.0  0.0 35.0    0.0  760.7   0 100 c2t10d0
>>     0.0   48.0    0.0 3424.0  0.0 35.0    0.0  729.0   0 100 c2t11d0
>>     0.0   62.0    0.0 4408.1  0.0 35.0    0.0  564.4   0 100 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   60.0    0.0 4279.8  0.0 35.0    0.0  583.2   0 100 c2t8d0
>>     0.0   57.0    0.0 4065.8  0.0 35.0    0.0  613.9   0 100 c2t9d0
>>     0.0   59.0    0.0 4194.3  0.0 35.0    0.0  593.1   0 100 c2t10d0
>>     0.0   56.0    0.0 4023.3  0.0 35.0    0.0  624.9   0 100 c2t11d0
>>     0.0   48.0    0.0 3424.3  0.0 35.0    0.0  729.1   0 100 c2t12d0

JCM> drops back a fraction
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   65.7    0.0 1385.0  0.0 14.5    0.0  220.8   0  90 c2t8d0
>>     0.9   68.4   39.8 1623.6  0.0 13.0    0.0  187.8   0  87 c2t9d0
>>     0.9   74.9   39.3 2054.6  0.0 16.7    0.0  219.6   0  94 c2t10d0
>>     0.9   70.3   39.3 1662.9  0.0 15.4    0.0  216.1   0  95 c2t11d0
>>     0.0   68.4    0.0 1736.0  0.0 14.9    0.0  217.9   0  87 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   65.3    0.0 3287.1  0.0 29.2    0.0  447.8   0  99 c2t8d0
>>     0.0   55.5    0.0 2642.4  0.0 28.2    0.0  508.9   0  99 c2t9d0
>>     0.0   47.9    0.0 2130.0  0.0 26.7    0.0  558.2   0 100 c2t10d0
>>     0.0   66.4    0.0 3336.1  0.0 29.3    0.0  441.2   0 100 c2t11d0
>>     0.0   65.3    0.0 3103.3  0.0 29.7    0.0  454.7   0  99 c2t12d0
>>     0.0    1.1    0.0    2.2  0.0  0.0    0.0   10.0   0   1 c1t1d0
JCM> but quickly reverts to 100%:
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   44.0    0.0 3125.2  0.0 35.0    0.0  795.1   0 100 c2t8d0
>>     0.0   50.0    0.0 3553.8  0.0 35.0    0.0  699.7   0 100 c2t9d0
>>     0.0   55.0    0.0 3895.8  0.0 35.0    0.0  636.1   0 100 c2t10d0
>>     0.0   44.0    0.0 3081.7  0.0 35.0    0.0  795.1   0 100 c2t11d0
>>     0.0   48.0    0.0 3424.7  0.0 35.0    0.0  728.8   0 100 c2t12d0
>>     0.0    1.0    0.0    8.0  0.0  0.0    0.0    8.7   0   1 c1t1d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   52.0    0.0 3724.6  0.0 35.0    0.0  672.8   0 100 c2t8d0
>>     0.0   46.0    0.0 3253.1  0.0 35.0    0.0  760.6   0 100 c2t9d0
>>     0.0   38.0    0.0 2697.0  0.0 35.0    0.0  920.7   0 100 c2t10d0
>>     0.0   51.0    0.0 3638.6  0.0 35.0    0.0  686.0   0 100 c2t11d0
>>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   44.0    0.0 2915.0  0.0 22.9    0.0  521.2   0 100 c2t8d0
>>     0.0   47.0    0.0 3382.1  0.0 24.1    0.0  512.0   0 100 c2t9d0
>>     0.0   56.0    0.0 4024.2  0.0 25.7    0.0  459.3   0 100 c2t10d0
>>     0.0   41.0    0.0 2954.1  0.0 22.7    0.0  552.4   0 100 c2t11d0
>>     0.0   46.0    0.0 3083.6  0.0 22.8    0.0  494.7   0  98 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0  103.0    0.0  207.5  0.0  5.4    0.0   52.4   0  89 c2t8d0
>>     0.0  101.0    0.0  381.4  0.0  4.8    0.0   47.2   0  89 c2t9d0
>>     0.0  102.0    0.0  432.9  0.0  4.0    0.0   39.5   0  79 c2t10d0
>>     0.0  112.0    0.0  257.5  0.0  5.9    0.0   52.4   0  95 c2t11d0
>>     0.0  111.0    0.0  206.5  0.0  6.1    0.0   54.8   0  92 c2t12d0
JCM> and then tails off
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0  102.0    0.0  213.0  0.0  4.7    0.0   46.3   0  78 c2t8d0
>>     0.0  106.0    0.0  214.5  0.0  5.0    0.0   47.6   0  82 c2t9d0
>>     0.0   95.0    0.0  214.5  0.0  4.3    0.0   45.5   0  71 c2t10d0
>>     0.0   97.0    0.0  214.0  0.0  4.7    0.0   48.9   0  80 c2t11d0
>>     0.0   99.0    0.0  216.5  0.0  5.2    0.0   52.7   0  90 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0   67.0    0.0  104.0  0.0  3.2    0.0   47.3   0  55 c2t8d0
>>     0.0   68.0    0.0  106.5  0.0  3.4    0.0   49.6   0  58 c2t9d0
>>     0.0   66.0    0.0  101.5  0.0  3.2    0.0   48.6   0  60 c2t10d0
>>     0.0   64.0    0.0  103.0  0.0  3.1    0.0   48.0   0  57 c2t11d0
>>     0.0   69.0    0.0  103.5  0.0  3.1    0.0   45.4   0  62 c2t12d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0    1.0    0.0    1.5  0.0  0.0    0.0   10.2   0   1 c1t0d0
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>     0.0    1.0    0.0    0.5  0.0  0.0    0.0   10.3   0   1 c1t1d0


JCM> All of which, to me, look like you''re filling a buffer
JCM> or two.

JCM> I don''t recall the config of your zpool, but if the
JCM> devices are disks that are direct or san-attached, I
JCM> would be wondering about their outstanding queue depths.

JCM> I think it''s time to break out some D to find out where
JCM> in the stack the bottleneck(s) really are.


Maybe he could try to limit # of queued request per disk in zfs to
something smaller than default 35 (maybe even down to 1?)


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch - PAE

2007-Dec-17 10:37 UTC

head link

[zfs-discuss] JBOD performance

dd uses a default block size of 512B.  Does this map to your
expected usage ? When I quickly tested the CPU cost of small
read from cache, I did see that ZFS was more costly than UFS
up to a crossover between 8K and 16K.   We might need a more
comprehensive study of that (data in/out of cache, different
recordsize  &    alignment constraints   ).   But  for small
syscalls, I think we might need some work  in ZFS to make it
CPU efficient.

So first,  does  small sequential write    to a large  file,
matches an interesting use case ?


-r

James C. McPherson

2007-Dec-17 10:56 UTC

head link

[zfs-discuss] JBOD performance

Robert Milkowski wrote:> Hello James,
> 
> Sunday, December 16, 2007, 9:54:18 PM, you wrote:
> 
> JCM> hi Frank,
> 
> JCM> there is an interesting pattern here (at least, to my
> JCM> untrained eyes) - your %b starts off quite low:
....> JCM> All of which, to me, look like you''re filling a buffer
> JCM> or two.
> 
> JCM> I don''t recall the config of your zpool, but if the
> JCM> devices are disks that are direct or san-attached, I
> JCM> would be wondering about their outstanding queue depths.
> 
> JCM> I think it''s time to break out some D to find out where
> JCM> in the stack the bottleneck(s) really are.
> Maybe he could try to limit # of queued request per disk in zfs to
> something smaller than default 35 (maybe even down to 1?)
Hi Robert,
yup, that''s on my list of things for Frank to try. I''ve
asked for a bit more config information though so we can
get a bit of clarity on that front first.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Frank Penczek

2007-Dec-17 15:50 UTC

head link

[zfs-discuss] JBOD performance

Hi,

On Dec 17, 2007 10:37 AM, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:>
>
> dd uses a default block size of 512B.  Does this map to your
> expected usage ? When I quickly tested the CPU cost of small
> read from cache, I did see that ZFS was more costly than UFS
> up to a crossover between 8K and 16K.   We might need a more
> comprehensive study of that (data in/out of cache, different
> recordsize  &    alignment constraints   ).   But  for small
> syscalls, I think we might need some work  in ZFS to make it
> CPU efficient.
>
> So first,  does  small sequential write    to a large  file,
> matches an interesting use case ?
The pool holds home directories so small sequential writes to one
large file present one of a few interesting use cases.
The performance is equally disappointing for many (small) files
like compiling projects in svn repositories.

Cheers,
  Frank

Roch - PAE

2007-Dec-17 16:18 UTC

head link

[zfs-discuss] JBOD performance

Frank Penczek writes:
 > Hi,
 > 
 > On Dec 17, 2007 10:37 AM, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:
 > >
 > >
 > > dd uses a default block size of 512B.  Does this map to your
 > > expected usage ? When I quickly tested the CPU cost of small
 > > read from cache, I did see that ZFS was more costly than UFS
 > > up to a crossover between 8K and 16K.   We might need a more
 > > comprehensive study of that (data in/out of cache, different
 > > recordsize  &    alignment constraints   ).   But  for small
 > > syscalls, I think we might need some work  in ZFS to make it
 > > CPU efficient.
 > >
 > > So first,  does  small sequential write    to a large  file,
 > > matches an interesting use case ?
 > 
 > The pool holds home directories so small sequential writes to one
 > large file present one of a few interesting use cases.

Can you be more specific here ?

Do you have a body of application that would do small
sequential writes; or one in particular ? Another
interesting info is if we expect those to be allocating
writes or overwrite (beware that some app, move the old file 
out, then run allocating writes, then unlink the original
file).



 > The performance is equally disappointing for many (small) files
 > like compiling projects in svn repositories.
 > 

???

-r


 > Cheers,
 >   Frank

Rob Logan

2007-Dec-17 18:22 UTC

head link

[zfs-discuss] JBOD performance

>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device >>     0.0   48.0    0.0 3424.6  0.0 35.0    0.0  728.9   0 100 c2t8d0

 > That service time is just terrible!

yea, that service time is unreasonable. almost a second for each
command? and 35 more commands queued? (reorder = faster)

I had a server with similar service times, so I repaired
a replacement blade and when I went to slid it in, noticed a
loud noise coming from the blade below it.. notified the windows
person who owned it and it had been "broken" for some time
and turned it off... it was much better after that.

vibration... check vibration.

			Rob

Frank Penczek

2007-Dec-17 18:42 UTC

head link

[zfs-discuss] JBOD performance

Hi,

On Dec 17, 2007 4:18 PM, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:>  >
>  > The pool holds home directories so small sequential writes to one
>  > large file present one of a few interesting use cases.
>
> Can you be more specific here ?
>
> Do you have a body of application that would do small
> sequential writes; or one in particular ? Another
> interesting info is if we expect those to be allocating
> writes or overwrite (beware that some app, move the old file
> out, then run allocating writes, then unlink the original
> file).
Sorry, I try to be more specific.
The zpool contains home directories that are exported to client machines.
It is hard to predict what exactly users are doing, but one thing users do for
certain is checking out software projects from our subversion server. The
projects typically contain many source code files (thousands) and a
build process
accesses all of them in the worst case. That is what I meant by "many
(small)
files like compiling projects" in my previous post. The performance
for this case
is ... hopefully improvable.

Now for sequential writes:
We don''t have a specific application issuing sequential writes but I
can think of
at least a few cases where these writes may occur, e.g.
dumps of substantial amounts of measurement data or growing log files
of applications.
In either case these would be mainly allocating writes.

Does this provide the information you''re interested in?


Cheers,
  Frank

Roch - PAE

2007-Dec-18 09:47 UTC

head link

[zfs-discuss] JBOD performance

Frank Penczek writes:
 > Hi,
 > 
 > On Dec 17, 2007 4:18 PM, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:
 > >  >
 > >  > The pool holds home directories so small sequential writes to
one
 > >  > large file present one of a few interesting use cases.
 > >
 > > Can you be more specific here ?
 > >
 > > Do you have a body of application that would do small
 > > sequential writes; or one in particular ? Another
 > > interesting info is if we expect those to be allocating
 > > writes or overwrite (beware that some app, move the old file
 > > out, then run allocating writes, then unlink the original
 > > file).
 > 
 > Sorry, I try to be more specific.
 > The zpool contains home directories that are exported to client machines.
 > It is hard to predict what exactly users are doing, but one thing users do
for
 > certain is checking out software projects from our subversion server. The
 > projects typically contain many source code files (thousands) and a
 > build process
 > accesses all of them in the worst case. That is what I meant by "many
(small)
 > files like compiling projects" in my previous post. The performance
 > for this case
 > is ... hopefully improvable.
 > 

This we''ll have to work on. But first, If this is to
Storage with NVRAM, I assume you checked that the storage
does not flush it''s caches :

	http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

If that is not your problem and if ZFS underperform another
FS on the backend of NFS, then this needs investigation.

If ZFS/NFS underperformance a direct attach FS that might
just be an NFS issue not related to ZFS. Again that needs
investigation. 

Performance gains won''t happen unless we find out what
doesn''t work.

 > Now for sequential writes:
 > We don''t have a specific application issuing sequential writes
but I
 > can think of
 > at least a few cases where these writes may occur, e.g.
 > dumps of substantial amounts of measurement data or growing log files
 > of applications.
 > In either case these would be mainly allocating writes.
 > 

Right but I''d hope the application would issue substantially 
large writes specially if it'' needs to dump  data at high rate.
If the data rate is more modest, then the CPU lost to this 
effect will itself be modest.

 > Does this provide the information you''re interested in?
 > 

I get a sense that it''s  more important we find out what is
your build issue is. But the small writes will have to be
improved one day also.

-r

 > 
 > Cheers,
 >   Frank

Peter Schuller

2007-Dec-18 17:12 UTC

head link

[zfs-discuss] JBOD performance

> Sequential writing problem with process throttling - there''s an
open
> bug for it for quite a while. Try to lower txg_time to 1s - should
> help a little bit.
Yeah, my post was mostly to emphasize that on commodity hardware raidz2 does 
not even come close to being a CPU bottleneck. It wasn''t a poke at the 
streaming performance. Very interesting to hear there''s a bug open for
it
though.
> Can you also post iostat -xnz 1 while you''re doing dd?
> and zpool status
This was FreeBSD, but I can provide iostat -x if you still want it for some 
reason. 

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part.
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071218/2130f386/attachment.bin>

Robert Milkowski

2007-Dec-18 23:31 UTC

head link

[zfs-discuss] JBOD performance

Hello Peter,

Tuesday, December 18, 2007, 5:12:48 PM, you wrote:
>> Sequential writing problem with process throttling - there''s
an open
>> bug for it for quite a while. Try to lower txg_time to 1s - should
>> help a little bit.
PS> Yeah, my post was mostly to emphasize that on commodity hardware raidz2
does
PS> not even come close to being a CPU bottleneck. It wasn''t a poke
at the
PS> streaming performance. Very interesting to hear there''s a bug
open for it
PS> though.
>> Can you also post iostat -xnz 1 while you''re doing dd?
>> and zpool status
PS> This was FreeBSD, but I can provide iostat -x if you still want it for
some
PS> reason. 


I was just wandering that maybe there''s a problem with just one
disk...



-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Peter Schuller

2007-Dec-19 17:32 UTC

head link

[zfs-discuss] JBOD performance

> I was just wandering that maybe there''s a problem with just one
> disk...
No, this is something I have observed on at least four different systems, with 
vastly varying hardware. Probably just the effects of the known problem.

Thanks,

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part.
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071219/cf083a53/attachment.bin>

zfs discuss - Dec 2007 - JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance

[zfs-discuss] JBOD performance