Hi, We have a Sun Storage 7410 with the latest release (which is based upon opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks in RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes are exported with NFSv3 over TCP. NFS mount options are: rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio We compare that system with our Netapp FAS 3140 and notice a high performance decrease when multiple hosts write many small files in parrallel (e.g. CVS checkout). Doing that on one single host, the write speed is quite similar on both systems: Netapp FAS 3140: bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject real 0m32.914s user 0m1.568s sys 0m3.060s Sun Storage 7410: bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout myBigProject real 0m34.049s user 0m1.592s sys 0m3.184s Doing the same operation on 5 different hosts on the same NFS share in different directories we notice a performance decrease which is proportional to the number of writing hosts (5x slower) while the same operation on Netapp FAS 3140 is less than 2x slower: Netapp FAS 3140: bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject real 0m58.120s user 0m1.452s sys 0m2.976s Sun Storage 7410: bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout myBigProject real 4m32.747s user 0m2.296s sys 0m4.224s Often we run into timeouts (CVS timeout is set to 60 minutes) when building software during a nightly build process which makes this storage unusable because the NFS writes are slowed down drastically. This happens also when we run VMware machines on an ESX server on a NFS pool and Oracle databases on NFS. Netapp and Oracle recommend using NFS as central storage but we wanted a less expensive system because it is used only for development and testing and not highly critical production data. But the performance slowdown when more than one writing NFS client is involved is too bad. What might here the bottleneck? Any ideas? The zfs log device? Are there more than one zfs log device required for parallel performance? As many as NFS clients? Best regards, Bernd nfsserver# zpool status pool: pool-0 state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 04:27:21 2009 config: NAME STATE READ WRITE CKSUM pool-0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c3t5000C50014ED4D01d0 ONLINE 0 0 0 c3t5000C50014F4EC09d0 ONLINE 0 0 0 c3t5000C50014F4EE46d0 ONLINE 0 0 0 c3t5000C50014F4F50Ed0 ONLINE 0 0 0 c3t5000C50014F4FB64d0 ONLINE 0 0 0 c3t5000C50014F50A7Cd0 ONLINE 0 0 0 c3t5000C50014F50F57d0 ONLINE 0 0 0 c3t5000C50014F52A59d0 ONLINE 0 0 0 c3t5000C50014F52D83d0 ONLINE 0 0 0 c3t5000C50014F52E0Cd0 ONLINE 0 0 0 c3t5000C50014F52F9Bd0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c3t5000C50014F54EB1d0 ONLINE 0 0 0 254K resilvered c3t5000C50014F54FC9d0 ONLINE 0 0 0 264K resilvered c3t5000C50014F512E3d0 ONLINE 0 0 0 264K resilvered c3t5000C50014F515C9d0 ONLINE 0 0 0 262K resilvered c3t5000C50014F549EAd0 ONLINE 0 0 0 262K resilvered c3t5000C50014F553EBd0 ONLINE 0 0 0 262K resilvered c3t5000C50014F5072Cd0 ONLINE 0 0 0 279K resilvered c3t5000C50014F5192Bd0 ONLINE 0 0 0 4.60M resilvered c3t5000C50014F5494Bd0 ONLINE 0 0 0 258K resilvered c3t5000C50014F5500Bd0 ONLINE 0 0 0 264K resilvered c3t5000C50014F51865d0 ONLINE 0 0 0 248K resilvered logs c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ONLINE 0 0 0 spares c3t5000C50014F53925d0 AVAIL errors: No known data errors pool: system state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0s0 ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 errors: No known data errors nfsserver# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> /pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0 1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> /pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0 2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014ed4d01 3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f4ec09 4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f4ee46 5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f4f50e 6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f4fb64 7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f50a7c 8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f50f57 9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f52a59 10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f52d83 11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f52e0c 12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f52f9b 13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f54eb1 14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f54fc9 15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f512e3 16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f515c9 17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f549ea 18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f553eb 19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f5072c 20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f5192b 21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f5494b 22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f5500b 23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f51865 24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> /scsi_vhci/disk at g5000c50014f53925 25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC ZeusIOPS-0430-17.00GB> /scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C Specify disk (enter its number): Specify disk (enter its number): -- This message posted from opensolaris.org
On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote:> Hi, > > We have a Sun Storage 7410 with the latest release (which is based > upon opensolaris). The system uses a hybrid storage pool (23 1TB > SATA disks in RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes > are exported with NFSv3 over TCP. NFS mount options are: > > rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio > > We compare that system with our Netapp FAS 3140 and notice a high > performance decrease when multiple hosts write many small files in > parrallel (e.g. CVS checkout). > > Doing that on one single host, the write speed is quite similar on > both systems: > > Netapp FAS 3140: > bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject > real 0m32.914s > user 0m1.568s > sys 0m3.060s > > Sun Storage 7410: > bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout > myBigProject > real 0m34.049s > user 0m1.592s > sys 0m3.184s > > Doing the same operation on 5 different hosts on the same NFS share > in different directories we notice a performance decrease which is > proportional to the number of writing hosts (5x slower) while the > same operation on Netapp FAS 3140 is less than 2x slower: > > Netapp FAS 3140: > bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject > real 0m58.120s > user 0m1.452s > sys 0m2.976s > > Sun Storage 7410: > bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout > myBigProject > real 4m32.747s > user 0m2.296s > sys 0m4.224s > > Often we run into timeouts (CVS timeout is set to 60 minutes) when > building software during a nightly build process which makes this > storage unusable because the NFS writes are slowed down drastically. > This happens also when we run VMware machines on an ESX server on a > NFS pool and Oracle databases on NFS. Netapp and Oracle recommend > using NFS as central storage but we wanted a less expensive system > because it is used only for development and testing and not highly > critical production data. But the performance slowdown when more > than one writing NFS client is involved is too bad. > > What might here the bottleneck? Any ideas? The zfs log device? Are > there more than one zfs log device required for parallel > performance? As many as NFS clients?bingo! One should suffice. BTW, not fair comparing a machine with an NVRAM cache to one without... add an SSD for the log to even things out. -- richard> > Best regards, > Bernd > > > nfsserver# zpool status > pool: pool-0 > state: ONLINE > scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 > 04:27:21 2009 > config: > > NAME STATE READ > WRITE CKSUM > pool-0 ONLINE > 0 0 0 > raidz2 ONLINE > 0 0 0 > c3t5000C50014ED4D01d0 ONLINE > 0 0 0 > c3t5000C50014F4EC09d0 ONLINE > 0 0 0 > c3t5000C50014F4EE46d0 ONLINE > 0 0 0 > c3t5000C50014F4F50Ed0 ONLINE > 0 0 0 > c3t5000C50014F4FB64d0 ONLINE > 0 0 0 > c3t5000C50014F50A7Cd0 ONLINE > 0 0 0 > c3t5000C50014F50F57d0 ONLINE > 0 0 0 > c3t5000C50014F52A59d0 ONLINE > 0 0 0 > c3t5000C50014F52D83d0 ONLINE > 0 0 0 > c3t5000C50014F52E0Cd0 ONLINE > 0 0 0 > c3t5000C50014F52F9Bd0 ONLINE > 0 0 0 > raidz2 ONLINE > 0 0 0 > c3t5000C50014F54EB1d0 ONLINE > 0 0 0 254K resilvered > c3t5000C50014F54FC9d0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F512E3d0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F515C9d0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F549EAd0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F553EBd0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F5072Cd0 ONLINE > 0 0 0 279K resilvered > c3t5000C50014F5192Bd0 ONLINE > 0 0 0 4.60M resilvered > c3t5000C50014F5494Bd0 ONLINE > 0 0 0 258K resilvered > c3t5000C50014F5500Bd0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F51865d0 ONLINE > 0 0 0 248K resilvered > logs > c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ONLINE > 0 0 0 > spares > c3t5000C50014F53925d0 AVAIL > > errors: No known data errors > > pool: system > state: ONLINE > status: The pool is formatted using an older on-disk format. The > pool can > still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. Once this is done, > the > pool will no longer be accessible on older software versions. > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > system ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t1d0s0 ONLINE 0 0 0 > c0t0d0s0 ONLINE 0 0 0 > > errors: No known data errors > > nfsserver# echo | format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> > /pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0 > 1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> > /pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0 > 2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014ed4d01 > 3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4ec09 > 4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4ee46 > 5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4f50e > 6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4fb64 > 7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f50a7c > 8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f50f57 > 9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52a59 > 10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52d83 > 11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52e0c > 12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52f9b > 13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f54eb1 > 14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f54fc9 > 15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f512e3 > 16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f515c9 > 17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f549ea > 18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f553eb > 19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5072c > 20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5192b > 21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5494b > 22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5500b > 23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f51865 > 24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f53925 > 25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC > ZeusIOPS-0430-17.00GB> > /scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C > Specify disk (enter its number): Specify disk (enter its number): > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Sep 29, 2009 at 10:35 AM, Richard Elling <richard.elling at gmail.com> wrote:> > On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote: > >> Hi, >> >> We have a Sun Storage 7410 with the latest release (which is based upon >> opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks in >> RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes are exported with >> NFSv3 over TCP. NFS mount options are: >> >> rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio >> >> We compare that system with our Netapp FAS 3140 and notice a high >> performance decrease when multiple hosts write many small files in parrallel >> (e.g. CVS checkout). >> >> Doing that on one single host, the write speed is quite similar on both >> systems: >> >> Netapp FAS 3140: >> ? bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject >> ? real ? ?0m32.914s >> ? user ? ?0m1.568s >> ? sys ? ? 0m3.060s >> >> Sun Storage 7410: >> ? bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout >> myBigProject >> ? real ? ?0m34.049s >> ? user ? ?0m1.592s >> ? sys ? ? 0m3.184s >> >> Doing the same operation on 5 different hosts on the same NFS share in >> different directories we notice a performance decrease which is proportional >> to the number of writing hosts (5x slower) while the same operation on >> Netapp FAS 3140 is less than 2x slower: >> >> Netapp FAS 3140: >> ? bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject >> ? real ? ?0m58.120s >> ? user ? ?0m1.452s >> ? sys ? ? 0m2.976s >> >> Sun Storage 7410: >> ? bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout >> myBigProject >> ? real ? ?4m32.747s >> ? user ? ?0m2.296s >> ? sys ? ? 0m4.224s >> >> Often we run into timeouts (CVS timeout is set to 60 minutes) when >> building software during a nightly build process which makes this storage >> unusable because the NFS writes are slowed down drastically. This happens >> also when we run VMware machines on an ESX server on a NFS pool and Oracle >> databases on NFS. Netapp and Oracle recommend using NFS as central storage >> but we wanted a less expensive system because it is used only for >> development and testing and not highly critical production data. But the >> performance slowdown when more than one writing NFS client is involved is >> too bad. >> >> What might here the bottleneck? Any ideas? The zfs log device? Are there >> more than one zfs log device required for parallel performance? As many as >> NFS clients? > > bingo! ?One should suffice. > > BTW, not fair comparing a machine with an NVRAM cache to one > without... add an SSD for the log to even things out. > ?-- richardNot exactly true, look below at his pool configuration....>> nfsserver# zpool status >> ?pool: pool-0 >> state: ONLINE >> scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 04:27:21 >> 2009 >> config: >> >> ? ? ? NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATE ? ? READ WRITE >> CKSUM >> ? ? ? pool-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014ED4D01d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4EC09d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4EE46d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4F50Ed0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4FB64d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F50A7Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F50F57d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52A59d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52D83d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52E0Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52F9Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F54EB1d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?254K resilvered >> ? ? ? ? ? c3t5000C50014F54FC9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F512E3d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F515C9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F549EAd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F553EBd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F5072Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?279K resilvered >> ? ? ? ? ? c3t5000C50014F5192Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?4.60M resilvered >> ? ? ? ? ? c3t5000C50014F5494Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?258K resilvered >> ? ? ? ? ? c3t5000C50014F5500Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F51865d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?248K resilvered >> ? ? ? logs >> ? ? ? ? c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? spares >> ? ? ? ? c3t5000C50014F53925d0 ? ? ? ? ? ? ? ? ? ? ?AVAIL >> >> errors: No known data errorsIt appears he has a: http://www.stec-inc.com/product/zeusiops.php Which should be more then capable of providing the IOPS needed. Was that pool rebuilding during the tests or did that happen afterwards? -Ross
Hi, The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C) as ZFS log device. NFS writes from only one host are not the problem. Even with may small files it is almost as fast as a Netapp. Problem arises when doing the same parallel from n hosts. E.g. the same write from 10 hosts lasts 10 times longer. On the Netapp the same from 10 hosts takes only 2-3 times longer. Network bandwith is also not the problem here. Bye Bernd -- This message posted from opensolaris.org
On Tue, 29 Sep 2009, Bernd Nies wrote:> > NFS writes from only one host are not the problem. Even with may > small files it is almost as fast as a Netapp. Problem arises when > doing the same parallel from n hosts. E.g. the same write from 10 > hosts lasts 10 times longer. On the Netapp the same from 10 hosts > takes only 2-3 times longer. Network bandwith is also not the > problem here.Striping across two large raidz2s is not ideal for multi-user use. You are getting the equivalent of two disks worth of IOPS, which does not go very far. More smaller raidz vdevs or mirror vdevs would be better. Also, make sure that you have plenty of RAM installed. What disk configuration (number of disks, and RAID topology) is the NetApp using? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> Striping across two large raidz2s is not ideal for multi-user use. You > are getting the equivalent of two disks worth of IOPS, which does not > go very far. More smaller raidz vdevs or mirror vdevs would be > better. Also, make sure that you have plenty of RAM installed. >For small files I would definitely go mirrored.> What disk configuration (number of disks, and RAID topology) is the > NetApp using? >On NetApp you only can choose between RAID-DP and RAID-DP :-) With mirroring you will certainly loose space-wise against NetApp, but if your data compresses well, you will still end up with more space available. Our 7410 system currently compresses with a CPU utilisation of around 3% for compression. This while using gzip-2 and getting a compression ratio of 1.96. So far, I''m very happy with the system.
On Sep 29, 2009, at 7:59 AM, Bernd Nies wrote:> Hi, > > The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C) > as ZFS log device.I apologize, I should know better than to answer before the first cup of coffee :-P> NFS writes from only one host are not the problem. Even with may > small files it is almost as fast as a Netapp. Problem arises when > doing the same parallel from n hosts. E.g. the same write from 10 > hosts lasts 10 times longer. On the Netapp the same from 10 hosts > takes only 2-3 times longer. Network bandwith is also not the > problem here.We would need more info before we could proceed. Can you collect some iostat data? iostat -zxnT d 1 The "raidz acts like a single disk" for IOPS is felt on small, random read workloads. Random write workloads perform better. -- richard
Hi Bob,> Striping across two large raidz2s is not ideal for > multi-user use. > You are getting the equivalent of two disks worth of > IOPS, which does > not go very far. More smaller raidz vdevs or mirror > vdevs would be > better. Also, make sure that you have plenty of RAM > installed.This is new. I thought the RAID level is responsible for overall write/read performance, no matter how many hosts write through NFS. The NFS layer should care about this. The filesystem or RAID itself doesn''t know how many hosts are writing. Or am I wrong? I followed this recommendation for choosing the RAID level because we wanted a system with high capacity and high disk fault tolerance: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl The Storage 7410 came with 16 GB RAM installed.> What disk configuration (number of disks, and RAID > topology) is the > NetApp using?The Netapp has a double parity RAID with a raid group size of 16. There are 14 disks in 3 shelves, connected with 4 Gb/s fibre channel to the head. I thought double parity RAID (Netapp calls it RAID-DP) is something similar to RAIDZ2. Best regards, Bernd -- This message posted from opensolaris.org
Hi, Just for closing this topic. Two issues have been found which caused the slow write performance on our Sun Storage 7410 with RAIDZ2: (1) A Opensolaris bug when NFS mount option is set to wsize=32768. Reducing to wsize=16384 resulted in a performance gain of about factor 2. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887 (2) The Sun Storage 7410 (software release 2009.09.01.0.0,1-1.2) was configured as an LDAP client for mapping Unix UID/GID to Windows names/groups. At every file access the filer asked the LDAP server and resolved the ownership of the file. This also happened during NDMP backups and caused a high load on the LDAP server. Seems that this release has a non-working name service cache daemon or that the cache size is too small. We have about 500 users and 100 groups. The LDAP replica was a Sun directory server 5.2p5 on a rather slow SunFire V240 with Solaris 9. After migrating the LDAP server to a fast machine (Solaris 10 x86 on VMware ESX 4i) the NFS I/O rate was much better and after disabling LDAP client at all the I/O rate is now about 16x better when 10 Linux hosts are untarring the Linux kernel source to the same NFS share. Actions: - time tar -xf ../linux-2.6.32-rc1.tar - time rm -rf linux-2.6.32-rc1 NFS mount options: wsize=16384 gzip: ZFS filesystem on the fly compression OpenStorage 7410 | tar -xf | rm -rf --------------------+-------------+------------ LDAP on, 1 client | 3m 50.809s | 0m 16.395s 10 clients | 19m 59.453s | 69m 12.107s --------------------+-------------+------------ LDAP off, 1 client | 1m 15.340s | 0m 14.784s 10 clients | 3m 29.785s | 4m 51.606s --------------------+-------------+------------ LDAP off, gzip 1 cl | 2m 13.713s | 0m 14.936s 10 cl | 3m 47.773s | 7m 37.606s In the meantime the system performs well. Best regards, Bernd -- This message posted from opensolaris.org