thr3ads.net - zfs discuss - [zfs-discuss] NFS/ZFS slow on parallel writes [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Bernd Nies

2009-Sep-29 09:03 UTC

[zfs-discuss] NFS/ZFS slow on parallel writes

Hi,

We have a Sun Storage 7410 with the latest release (which is based upon
opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks in RAIDZ2
and 1 18GB SSD as log device). The ZFS volumes are exported with NFSv3 over TCP.
NFS mount options are:

rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio

We compare that system with our Netapp FAS 3140 and notice a high performance
decrease when multiple hosts write many small files in parrallel (e.g. CVS
checkout).

Doing that on one single host, the write speed is quite similar on both systems:

Netapp FAS 3140:
    bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject
    real    0m32.914s
    user    0m1.568s
    sys     0m3.060s

Sun Storage 7410:
    bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout
myBigProject
    real    0m34.049s
    user    0m1.592s
    sys     0m3.184s

Doing the same operation on 5 different hosts on the same NFS share in different
directories we notice a performance decrease which is proportional to the number
of writing hosts (5x slower) while the same operation on Netapp FAS 3140 is less
than 2x slower:

Netapp FAS 3140:
    bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject
    real    0m58.120s
    user    0m1.452s
    sys     0m2.976s

Sun Storage 7410:
    bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout
myBigProject
    real    4m32.747s
    user    0m2.296s
    sys     0m4.224s 

Often we run into timeouts (CVS timeout is set to 60 minutes) when building
software during a nightly build process which makes this storage unusable
because the NFS writes are slowed down drastically. This happens also when we
run VMware machines on an ESX server on a NFS pool and Oracle databases on NFS.
Netapp and Oracle recommend using NFS as central storage but we wanted a less
expensive system because it is used only for development and testing and not
highly critical production data. But the performance slowdown when more than one
writing NFS client is involved is too bad.

What might here the bottleneck? Any ideas? The zfs log device? Are there more
than one zfs log device required for parallel performance? As many as NFS
clients?

Best regards,
Bernd


nfsserver# zpool status
  pool: pool-0
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 04:27:21 2009
config:

        NAME                                         STATE     READ WRITE CKSUM
        pool-0                                       ONLINE       0     0     0
          raidz2                                     ONLINE       0     0     0
            c3t5000C50014ED4D01d0                    ONLINE       0     0     0
            c3t5000C50014F4EC09d0                    ONLINE       0     0     0
            c3t5000C50014F4EE46d0                    ONLINE       0     0     0
            c3t5000C50014F4F50Ed0                    ONLINE       0     0     0
            c3t5000C50014F4FB64d0                    ONLINE       0     0     0
            c3t5000C50014F50A7Cd0                    ONLINE       0     0     0
            c3t5000C50014F50F57d0                    ONLINE       0     0     0
            c3t5000C50014F52A59d0                    ONLINE       0     0     0
            c3t5000C50014F52D83d0                    ONLINE       0     0     0
            c3t5000C50014F52E0Cd0                    ONLINE       0     0     0
            c3t5000C50014F52F9Bd0                    ONLINE       0     0     0
          raidz2                                     ONLINE       0     0     0
            c3t5000C50014F54EB1d0                    ONLINE       0     0     0 
254K resilvered
            c3t5000C50014F54FC9d0                    ONLINE       0     0     0 
264K resilvered
            c3t5000C50014F512E3d0                    ONLINE       0     0     0 
264K resilvered
            c3t5000C50014F515C9d0                    ONLINE       0     0     0 
262K resilvered
            c3t5000C50014F549EAd0                    ONLINE       0     0     0 
262K resilvered
            c3t5000C50014F553EBd0                    ONLINE       0     0     0 
262K resilvered
            c3t5000C50014F5072Cd0                    ONLINE       0     0     0 
279K resilvered
            c3t5000C50014F5192Bd0                    ONLINE       0     0     0 
4.60M resilvered
            c3t5000C50014F5494Bd0                    ONLINE       0     0     0 
258K resilvered
            c3t5000C50014F5500Bd0                    ONLINE       0     0     0 
264K resilvered
            c3t5000C50014F51865d0                    ONLINE       0     0     0 
248K resilvered
        logs
          c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0  ONLINE       0     0     0
        spares
          c3t5000C50014F53925d0                      AVAIL

errors: No known data errors

  pool: system
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        system        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c0t1d0s0  ONLINE       0     0     0
            c0t0d0s0  ONLINE       0     0     0

errors: No known data errors

nfsserver# echo | format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
          /pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0
       1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
          /pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0
       2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014ed4d01
       3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f4ec09
       4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f4ee46
       5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f4f50e
       6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f4fb64
       7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f50a7c
       8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f50f57
       9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f52a59
      10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f52d83
      11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f52e0c
      12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f52f9b
      13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f54eb1
      14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f54fc9
      15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f512e3
      16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f515c9
      17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f549ea
      18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f553eb
      19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f5072c
      20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f5192b
      21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f5494b
      22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f5500b
      23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f51865
      24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
          /scsi_vhci/disk at g5000c50014f53925
      25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC
ZeusIOPS-0430-17.00GB>
          /scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C
Specify disk (enter its number): Specify disk (enter its number):
-- 
This message posted from opensolaris.org

Richard Elling

2009-Sep-29 14:35 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote:
> Hi,
>
> We have a Sun Storage 7410 with the latest release (which is based  
> upon opensolaris). The system uses a hybrid storage pool (23 1TB  
> SATA disks in RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes  
> are exported with NFSv3 over TCP. NFS mount options are:
>
> rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio
>
> We compare that system with our Netapp FAS 3140 and notice a high  
> performance decrease when multiple hosts write many small files in  
> parrallel (e.g. CVS checkout).
>
> Doing that on one single host, the write speed is quite similar on  
> both systems:
>
> Netapp FAS 3140:
>    bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject
>    real    0m32.914s
>    user    0m1.568s
>    sys     0m3.060s
>
> Sun Storage 7410:
>    bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout  
> myBigProject
>    real    0m34.049s
>    user    0m1.592s
>    sys     0m3.184s
>
> Doing the same operation on 5 different hosts on the same NFS share  
> in different directories we notice a performance decrease which is  
> proportional to the number of writing hosts (5x slower) while the  
> same operation on Netapp FAS 3140 is less than 2x slower:
>
> Netapp FAS 3140:
>    bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject
>    real    0m58.120s
>    user    0m1.452s
>    sys     0m2.976s
>
> Sun Storage 7410:
>    bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout  
> myBigProject
>    real    4m32.747s
>    user    0m2.296s
>    sys     0m4.224s
>
> Often we run into timeouts (CVS timeout is set to 60 minutes) when  
> building software during a nightly build process which makes this  
> storage unusable because the NFS writes are slowed down drastically.  
> This happens also when we run VMware machines on an ESX server on a  
> NFS pool and Oracle databases on NFS. Netapp and Oracle recommend  
> using NFS as central storage but we wanted a less expensive system  
> because it is used only for development and testing and not highly  
> critical production data. But the performance slowdown when more  
> than one writing NFS client is involved is too bad.
>
> What might here the bottleneck? Any ideas? The zfs log device? Are  
> there more than one zfs log device required for parallel  
> performance? As many as NFS clients?
bingo!  One should suffice.

BTW, not fair comparing a machine with an NVRAM cache to one
without... add an SSD for the log to even things out.
  -- richard
>
> Best regards,
> Bernd
>
>
> nfsserver# zpool status
>  pool: pool-0
> state: ONLINE
> scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23  
> 04:27:21 2009
> config:
>
>        NAME                                         STATE     READ  
> WRITE CKSUM
>        pool-0                                       ONLINE        
> 0     0     0
>          raidz2                                     ONLINE        
> 0     0     0
>            c3t5000C50014ED4D01d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F4EC09d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F4EE46d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F4F50Ed0                    ONLINE        
> 0     0     0
>            c3t5000C50014F4FB64d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F50A7Cd0                    ONLINE        
> 0     0     0
>            c3t5000C50014F50F57d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F52A59d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F52D83d0                    ONLINE        
> 0     0     0
>            c3t5000C50014F52E0Cd0                    ONLINE        
> 0     0     0
>            c3t5000C50014F52F9Bd0                    ONLINE        
> 0     0     0
>          raidz2                                     ONLINE        
> 0     0     0
>            c3t5000C50014F54EB1d0                    ONLINE        
> 0     0     0  254K resilvered
>            c3t5000C50014F54FC9d0                    ONLINE        
> 0     0     0  264K resilvered
>            c3t5000C50014F512E3d0                    ONLINE        
> 0     0     0  264K resilvered
>            c3t5000C50014F515C9d0                    ONLINE        
> 0     0     0  262K resilvered
>            c3t5000C50014F549EAd0                    ONLINE        
> 0     0     0  262K resilvered
>            c3t5000C50014F553EBd0                    ONLINE        
> 0     0     0  262K resilvered
>            c3t5000C50014F5072Cd0                    ONLINE        
> 0     0     0  279K resilvered
>            c3t5000C50014F5192Bd0                    ONLINE        
> 0     0     0  4.60M resilvered
>            c3t5000C50014F5494Bd0                    ONLINE        
> 0     0     0  258K resilvered
>            c3t5000C50014F5500Bd0                    ONLINE        
> 0     0     0  264K resilvered
>            c3t5000C50014F51865d0                    ONLINE        
> 0     0     0  248K resilvered
>        logs
>          c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0  ONLINE        
> 0     0     0
>        spares
>          c3t5000C50014F53925d0                      AVAIL
>
> errors: No known data errors
>
>  pool: system
> state: ONLINE
> status: The pool is formatted using an older on-disk format.  The  
> pool can
>        still be used, but some features are unavailable.
> action: Upgrade the pool using ''zpool upgrade''.  Once
this is done,
> the
>        pool will no longer be accessible on older software versions.
> scrub: none requested
> config:
>
>        NAME          STATE     READ WRITE CKSUM
>        system        ONLINE       0     0     0
>          mirror      ONLINE       0     0     0
>            c0t1d0s0  ONLINE       0     0     0
>            c0t0d0s0  ONLINE       0     0     0
>
> errors: No known data errors
>
> nfsserver# echo | format
> Searching for disks...done
>
>
> AVAILABLE DISK SELECTIONS:
>       0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
>          /pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0
>       1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
>          /pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0
>       2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014ed4d01
>       3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f4ec09
>       4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f4ee46
>       5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f4f50e
>       6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f4fb64
>       7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f50a7c
>       8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f50f57
>       9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f52a59
>      10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f52d83
>      11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f52e0c
>      12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f52f9b
>      13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f54eb1
>      14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f54fc9
>      15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f512e3
>      16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f515c9
>      17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f549ea
>      18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f553eb
>      19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f5072c
>      20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f5192b
>      21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f5494b
>      22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f5500b
>      23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f51865
>      24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
>          /scsi_vhci/disk at g5000c50014f53925
>      25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC  
> ZeusIOPS-0430-17.00GB>
>          /scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C
> Specify disk (enter its number): Specify disk (enter its number):
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Ross Walker

2009-Sep-29 14:55 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

On Tue, Sep 29, 2009 at 10:35 AM, Richard Elling
<richard.elling at gmail.com> wrote:>
> On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote:
>
>> Hi,
>>
>> We have a Sun Storage 7410 with the latest release (which is based upon
>> opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks
in
>> RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes are exported with
>> NFSv3 over TCP. NFS mount options are:
>>
>> rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio
>>
>> We compare that system with our Netapp FAS 3140 and notice a high
>> performance decrease when multiple hosts write many small files in
parrallel
>> (e.g. CVS checkout).
>>
>> Doing that on one single host, the write speed is quite similar on both
>> systems:
>>
>> Netapp FAS 3140:
>> ? bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject
>> ? real ? ?0m32.914s
>> ? user ? ?0m1.568s
>> ? sys ? ? 0m3.060s
>>
>> Sun Storage 7410:
>> ? bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout
>> myBigProject
>> ? real ? ?0m34.049s
>> ? user ? ?0m1.592s
>> ? sys ? ? 0m3.184s
>>
>> Doing the same operation on 5 different hosts on the same NFS share in
>> different directories we notice a performance decrease which is
proportional
>> to the number of writing hosts (5x slower) while the same operation on
>> Netapp FAS 3140 is less than 2x slower:
>>
>> Netapp FAS 3140:
>> ? bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject
>> ? real ? ?0m58.120s
>> ? user ? ?0m1.452s
>> ? sys ? ? 0m2.976s
>>
>> Sun Storage 7410:
>> ? bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout
>> myBigProject
>> ? real ? ?4m32.747s
>> ? user ? ?0m2.296s
>> ? sys ? ? 0m4.224s
>>
>> Often we run into timeouts (CVS timeout is set to 60 minutes) when
>> building software during a nightly build process which makes this
storage
>> unusable because the NFS writes are slowed down drastically. This
happens
>> also when we run VMware machines on an ESX server on a NFS pool and
Oracle
>> databases on NFS. Netapp and Oracle recommend using NFS as central
storage
>> but we wanted a less expensive system because it is used only for
>> development and testing and not highly critical production data. But
the
>> performance slowdown when more than one writing NFS client is involved
is
>> too bad.
>>
>> What might here the bottleneck? Any ideas? The zfs log device? Are
there
>> more than one zfs log device required for parallel performance? As many
as
>> NFS clients?
>
> bingo! ?One should suffice.
>
> BTW, not fair comparing a machine with an NVRAM cache to one
> without... add an SSD for the log to even things out.
> ?-- richard
Not exactly true, look below at his pool configuration....
>> nfsserver# zpool status
>> ?pool: pool-0
>> state: ONLINE
>> scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23
04:27:21
>> 2009
>> config:
>>
>> ? ? ? NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATE ? ? READ WRITE
>> CKSUM
>> ? ? ? pool-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014ED4D01d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F4EC09d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F4EE46d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F4F50Ed0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F4FB64d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F50A7Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F50F57d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F52A59d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F52D83d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F52E0Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F52F9Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? ? ? c3t5000C50014F54EB1d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?254K resilvered
>> ? ? ? ? ? c3t5000C50014F54FC9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?264K resilvered
>> ? ? ? ? ? c3t5000C50014F512E3d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?264K resilvered
>> ? ? ? ? ? c3t5000C50014F515C9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?262K resilvered
>> ? ? ? ? ? c3t5000C50014F549EAd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?262K resilvered
>> ? ? ? ? ? c3t5000C50014F553EBd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?262K resilvered
>> ? ? ? ? ? c3t5000C50014F5072Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?279K resilvered
>> ? ? ? ? ? c3t5000C50014F5192Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?4.60M resilvered
>> ? ? ? ? ? c3t5000C50014F5494Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?258K resilvered
>> ? ? ? ? ? c3t5000C50014F5500Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?264K resilvered
>> ? ? ? ? ? c3t5000C50014F51865d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0
>> ? 0 ?248K resilvered
>> ? ? ? logs
>> ? ? ? ? c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ?ONLINE ? ? ? 0 ? ? 0
>> ? 0
>> ? ? ? spares
>> ? ? ? ? c3t5000C50014F53925d0 ? ? ? ? ? ? ? ? ? ? ?AVAIL
>>
>> errors: No known data errors
It appears he has a:

http://www.stec-inc.com/product/zeusiops.php

Which should be more then capable of providing the IOPS needed.

Was that pool rebuilding during the tests or did that happen afterwards?

-Ross

Bernd Nies

2009-Sep-29 14:59 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

Hi, 

The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C) as ZFS log
device.

NFS writes from only one host are not the problem. Even with may small files it
is almost as fast as a Netapp. Problem arises when doing the same parallel from
n hosts. E.g. the same write from 10 hosts lasts 10 times longer. On the Netapp
the same from 10 hosts takes only 2-3 times longer. Network bandwith is also not
the problem here.

Bye
Bernd
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Sep-29 15:18 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

On Tue, 29 Sep 2009, Bernd Nies wrote:>
> NFS writes from only one host are not the problem. Even with may 
> small files it is almost as fast as a Netapp. Problem arises when 
> doing the same parallel from n hosts. E.g. the same write from 10 
> hosts lasts 10 times longer. On the Netapp the same from 10 hosts 
> takes only 2-3 times longer. Network bandwith is also not the 
> problem here.
Striping across two large raidz2s is not ideal for multi-user use. 
You are getting the equivalent of two disks worth of IOPS, which does 
not go very far. More smaller raidz vdevs or mirror vdevs would be 
better.  Also, make sure that you have plenty of RAM installed.

What disk configuration (number of disks, and RAID topology) is the 
NetApp using?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Mika Borner

2009-Sep-29 15:54 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

Bob Friesenhahn wrote:> Striping across two large raidz2s is not ideal for multi-user use. You 
> are getting the equivalent of two disks worth of IOPS, which does not 
> go very far. More smaller raidz vdevs or mirror vdevs would be 
> better.  Also, make sure that you have plenty of RAM installed.
>
For small files I would definitely go mirrored.> What disk configuration (number of disks, and RAID topology) is the 
> NetApp using?
>On NetApp you only can choose between RAID-DP and RAID-DP :-)

With mirroring you will certainly loose space-wise against NetApp, but 
if your data compresses well, you will still end up with more space 
available. Our 7410 system currently compresses with a CPU utilisation 
of around 3% for compression. This while using gzip-2 and getting a 
compression ratio of 1.96.

So far, I''m very happy with the system.

Richard Elling

2009-Sep-29 18:44 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

On Sep 29, 2009, at 7:59 AM, Bernd Nies wrote:
> Hi,
>
> The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C)  
> as ZFS log device.
I apologize, I should know better than to answer before the first cup of
coffee :-P
> NFS writes from only one host are not the problem. Even with may  
> small files it is almost as fast as a Netapp. Problem arises when  
> doing the same parallel from n hosts. E.g. the same write from 10  
> hosts lasts 10 times longer. On the Netapp the same from 10 hosts  
> takes only 2-3 times longer. Network bandwith is also not the  
> problem here.
We would need more info before we could proceed.  Can you collect
some iostat data?
	iostat -zxnT d 1

The "raidz acts like a single disk" for IOPS is felt on small, random
read
workloads.  Random write workloads perform better.
  -- richard

Bernd Nies

2009-Sep-30 05:31 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

Hi Bob,
> Striping across two large raidz2s is not ideal for
> multi-user use. 
> You are getting the equivalent of two disks worth of
> IOPS, which does 
> not go very far. More smaller raidz vdevs or mirror
> vdevs would be 
> better.  Also, make sure that you have plenty of RAM
> installed.
This is new. I thought the RAID level is responsible for overall write/read
performance, no matter how many hosts write through NFS. The NFS layer should
care about this. The filesystem or RAID itself doesn''t know how many
hosts are writing. Or am I wrong?

I followed this recommendation for choosing the RAID level because we wanted a
system with high capacity and high disk fault tolerance:

http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

The Storage 7410 came with 16 GB RAM installed.
> What disk configuration (number of disks, and RAID
> topology) is the 
> NetApp using?
The Netapp has a double parity RAID with a raid group size of 16. There are 14
disks in 3 shelves, connected with 4 Gb/s fibre channel to the head. I thought
double parity RAID (Netapp calls it RAID-DP) is something similar to RAIDZ2.

Best regards,
Bernd
-- 
This message posted from opensolaris.org

Bernd Nies

2009-Oct-30 14:33 UTC

head link

[zfs-discuss] NFS/ZFS slow on parallel writes

Hi,

Just for closing this topic. Two issues have been found which caused the slow
write performance on our Sun Storage 7410 with RAIDZ2:

(1) A Opensolaris bug when NFS mount option is set to wsize=32768. Reducing to
wsize=16384 resulted in a performance gain of about factor 2.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887

(2) The Sun Storage 7410 (software release 2009.09.01.0.0,1-1.2) was configured
as an LDAP client for mapping Unix UID/GID to Windows names/groups. At every
file access the filer asked the LDAP server and resolved the ownership of the
file. This also happened during NDMP backups and caused a high load on the LDAP
server. Seems that this release has a non-working name service cache daemon or
that the cache size is too small. We have about 500 users and 100 groups.

The LDAP replica was a Sun directory server 5.2p5 on a rather slow SunFire V240
with Solaris 9. After migrating the LDAP server to a fast machine (Solaris 10
x86 on VMware ESX 4i) the NFS I/O rate was much better and after disabling LDAP
client at all the I/O rate is now about 16x better when 10 Linux hosts are
untarring the Linux kernel source to the same NFS share.

Actions:
- time tar -xf ../linux-2.6.32-rc1.tar
- time rm -rf linux-2.6.32-rc1

NFS mount options: wsize=16384
gzip: ZFS filesystem on the fly compression

OpenStorage 7410    | tar -xf     | rm -rf
--------------------+-------------+------------
LDAP on,  1 client  |  3m 50.809s |  0m 16.395s
         10 clients | 19m 59.453s | 69m 12.107s
--------------------+-------------+------------
LDAP off, 1 client  |  1m 15.340s |  0m 14.784s
         10 clients |  3m 29.785s |  4m 51.606s
--------------------+-------------+------------
LDAP off, gzip 1 cl |  2m 13.713s |  0m 14.936s
              10 cl |  3m 47.773s |  7m 37.606s

In the meantime the system performs well.

Best regards,
Bernd
-- 
This message posted from opensolaris.org

zfs discuss - Sep 2009 - NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes

[zfs-discuss] NFS/ZFS slow on parallel writes