Hi,
We have a Sun Storage 7410 with the latest release (which is based upon
opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks in RAIDZ2
and 1 18GB SSD as log device). The ZFS volumes are exported with NFSv3 over TCP.
NFS mount options are:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio
We compare that system with our Netapp FAS 3140 and notice a high performance
decrease when multiple hosts write many small files in parrallel (e.g. CVS
checkout).
Doing that on one single host, the write speed is quite similar on both systems:
Netapp FAS 3140:
bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject
real 0m32.914s
user 0m1.568s
sys 0m3.060s
Sun Storage 7410:
bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout
myBigProject
real 0m34.049s
user 0m1.592s
sys 0m3.184s
Doing the same operation on 5 different hosts on the same NFS share in different
directories we notice a performance decrease which is proportional to the number
of writing hosts (5x slower) while the same operation on Netapp FAS 3140 is less
than 2x slower:
Netapp FAS 3140:
bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject
real 0m58.120s
user 0m1.452s
sys 0m2.976s
Sun Storage 7410:
bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout
myBigProject
real 4m32.747s
user 0m2.296s
sys 0m4.224s
Often we run into timeouts (CVS timeout is set to 60 minutes) when building
software during a nightly build process which makes this storage unusable
because the NFS writes are slowed down drastically. This happens also when we
run VMware machines on an ESX server on a NFS pool and Oracle databases on NFS.
Netapp and Oracle recommend using NFS as central storage but we wanted a less
expensive system because it is used only for development and testing and not
highly critical production data. But the performance slowdown when more than one
writing NFS client is involved is too bad.
What might here the bottleneck? Any ideas? The zfs log device? Are there more
than one zfs log device required for parallel performance? As many as NFS
clients?
Best regards,
Bernd
nfsserver# zpool status
pool: pool-0
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 04:27:21 2009
config:
NAME STATE READ WRITE CKSUM
pool-0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c3t5000C50014ED4D01d0 ONLINE 0 0 0
c3t5000C50014F4EC09d0 ONLINE 0 0 0
c3t5000C50014F4EE46d0 ONLINE 0 0 0
c3t5000C50014F4F50Ed0 ONLINE 0 0 0
c3t5000C50014F4FB64d0 ONLINE 0 0 0
c3t5000C50014F50A7Cd0 ONLINE 0 0 0
c3t5000C50014F50F57d0 ONLINE 0 0 0
c3t5000C50014F52A59d0 ONLINE 0 0 0
c3t5000C50014F52D83d0 ONLINE 0 0 0
c3t5000C50014F52E0Cd0 ONLINE 0 0 0
c3t5000C50014F52F9Bd0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c3t5000C50014F54EB1d0 ONLINE 0 0 0
254K resilvered
c3t5000C50014F54FC9d0 ONLINE 0 0 0
264K resilvered
c3t5000C50014F512E3d0 ONLINE 0 0 0
264K resilvered
c3t5000C50014F515C9d0 ONLINE 0 0 0
262K resilvered
c3t5000C50014F549EAd0 ONLINE 0 0 0
262K resilvered
c3t5000C50014F553EBd0 ONLINE 0 0 0
262K resilvered
c3t5000C50014F5072Cd0 ONLINE 0 0 0
279K resilvered
c3t5000C50014F5192Bd0 ONLINE 0 0 0
4.60M resilvered
c3t5000C50014F5494Bd0 ONLINE 0 0 0
258K resilvered
c3t5000C50014F5500Bd0 ONLINE 0 0 0
264K resilvered
c3t5000C50014F51865d0 ONLINE 0 0 0
248K resilvered
logs
c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ONLINE 0 0 0
spares
c3t5000C50014F53925d0 AVAIL
errors: No known data errors
pool: system
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''. Once this is
done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s0 ONLINE 0 0 0
c0t0d0s0 ONLINE 0 0 0
errors: No known data errors
nfsserver# echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
/pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0
1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63>
/pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0
2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014ed4d01
3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f4ec09
4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f4ee46
5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f4f50e
6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f4fb64
7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f50a7c
8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f50f57
9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f52a59
10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f52d83
11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f52e0c
12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f52f9b
13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f54eb1
14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f54fc9
15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f512e3
16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f515c9
17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f549ea
18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f553eb
19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f5072c
20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f5192b
21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f5494b
22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f5500b
23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f51865
24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB>
/scsi_vhci/disk at g5000c50014f53925
25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC
ZeusIOPS-0430-17.00GB>
/scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C
Specify disk (enter its number): Specify disk (enter its number):
--
This message posted from opensolaris.org
On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote:> Hi, > > We have a Sun Storage 7410 with the latest release (which is based > upon opensolaris). The system uses a hybrid storage pool (23 1TB > SATA disks in RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes > are exported with NFSv3 over TCP. NFS mount options are: > > rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio > > We compare that system with our Netapp FAS 3140 and notice a high > performance decrease when multiple hosts write many small files in > parrallel (e.g. CVS checkout). > > Doing that on one single host, the write speed is quite similar on > both systems: > > Netapp FAS 3140: > bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject > real 0m32.914s > user 0m1.568s > sys 0m3.060s > > Sun Storage 7410: > bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout > myBigProject > real 0m34.049s > user 0m1.592s > sys 0m3.184s > > Doing the same operation on 5 different hosts on the same NFS share > in different directories we notice a performance decrease which is > proportional to the number of writing hosts (5x slower) while the > same operation on Netapp FAS 3140 is less than 2x slower: > > Netapp FAS 3140: > bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject > real 0m58.120s > user 0m1.452s > sys 0m2.976s > > Sun Storage 7410: > bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout > myBigProject > real 4m32.747s > user 0m2.296s > sys 0m4.224s > > Often we run into timeouts (CVS timeout is set to 60 minutes) when > building software during a nightly build process which makes this > storage unusable because the NFS writes are slowed down drastically. > This happens also when we run VMware machines on an ESX server on a > NFS pool and Oracle databases on NFS. Netapp and Oracle recommend > using NFS as central storage but we wanted a less expensive system > because it is used only for development and testing and not highly > critical production data. But the performance slowdown when more > than one writing NFS client is involved is too bad. > > What might here the bottleneck? Any ideas? The zfs log device? Are > there more than one zfs log device required for parallel > performance? As many as NFS clients?bingo! One should suffice. BTW, not fair comparing a machine with an NVRAM cache to one without... add an SSD for the log to even things out. -- richard> > Best regards, > Bernd > > > nfsserver# zpool status > pool: pool-0 > state: ONLINE > scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 > 04:27:21 2009 > config: > > NAME STATE READ > WRITE CKSUM > pool-0 ONLINE > 0 0 0 > raidz2 ONLINE > 0 0 0 > c3t5000C50014ED4D01d0 ONLINE > 0 0 0 > c3t5000C50014F4EC09d0 ONLINE > 0 0 0 > c3t5000C50014F4EE46d0 ONLINE > 0 0 0 > c3t5000C50014F4F50Ed0 ONLINE > 0 0 0 > c3t5000C50014F4FB64d0 ONLINE > 0 0 0 > c3t5000C50014F50A7Cd0 ONLINE > 0 0 0 > c3t5000C50014F50F57d0 ONLINE > 0 0 0 > c3t5000C50014F52A59d0 ONLINE > 0 0 0 > c3t5000C50014F52D83d0 ONLINE > 0 0 0 > c3t5000C50014F52E0Cd0 ONLINE > 0 0 0 > c3t5000C50014F52F9Bd0 ONLINE > 0 0 0 > raidz2 ONLINE > 0 0 0 > c3t5000C50014F54EB1d0 ONLINE > 0 0 0 254K resilvered > c3t5000C50014F54FC9d0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F512E3d0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F515C9d0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F549EAd0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F553EBd0 ONLINE > 0 0 0 262K resilvered > c3t5000C50014F5072Cd0 ONLINE > 0 0 0 279K resilvered > c3t5000C50014F5192Bd0 ONLINE > 0 0 0 4.60M resilvered > c3t5000C50014F5494Bd0 ONLINE > 0 0 0 258K resilvered > c3t5000C50014F5500Bd0 ONLINE > 0 0 0 264K resilvered > c3t5000C50014F51865d0 ONLINE > 0 0 0 248K resilvered > logs > c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ONLINE > 0 0 0 > spares > c3t5000C50014F53925d0 AVAIL > > errors: No known data errors > > pool: system > state: ONLINE > status: The pool is formatted using an older on-disk format. The > pool can > still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. Once this is done, > the > pool will no longer be accessible on older software versions. > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > system ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t1d0s0 ONLINE 0 0 0 > c0t0d0s0 ONLINE 0 0 0 > > errors: No known data errors > > nfsserver# echo | format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c0t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> > /pci at 1,0/pci10de,cb84 at 5,1/disk at 0,0 > 1. c0t1d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> > /pci at 1,0/pci10de,cb84 at 5,1/disk at 1,0 > 2. c3t5000C50014ED4D01d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014ed4d01 > 3. c3t5000C50014F4EC09d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4ec09 > 4. c3t5000C50014F4EE46d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4ee46 > 5. c3t5000C50014F4F50Ed0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4f50e > 6. c3t5000C50014F4FB64d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f4fb64 > 7. c3t5000C50014F50A7Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f50a7c > 8. c3t5000C50014F50F57d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f50f57 > 9. c3t5000C50014F52A59d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52a59 > 10. c3t5000C50014F52D83d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52d83 > 11. c3t5000C50014F52E0Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52e0c > 12. c3t5000C50014F52F9Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f52f9b > 13. c3t5000C50014F54EB1d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f54eb1 > 14. c3t5000C50014F54FC9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f54fc9 > 15. c3t5000C50014F512E3d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f512e3 > 16. c3t5000C50014F515C9d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f515c9 > 17. c3t5000C50014F549EAd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f549ea > 18. c3t5000C50014F553EBd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f553eb > 19. c3t5000C50014F5072Cd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5072c > 20. c3t5000C50014F5192Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5192b > 21. c3t5000C50014F5494Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5494b > 22. c3t5000C50014F5500Bd0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f5500b > 23. c3t5000C50014F51865d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f51865 > 24. c3t5000C50014F53925d0 <ATA-SEAGATE ST31000N-SU0E-931.51GB> > /scsi_vhci/disk at g5000c50014f53925 > 25. c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 <ATA-STEC > ZeusIOPS-0430-17.00GB> > /scsi_vhci/disk at gATASTECZeusIOPS018GBytesSTM0000D905C > Specify disk (enter its number): Specify disk (enter its number): > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Sep 29, 2009 at 10:35 AM, Richard Elling <richard.elling at gmail.com> wrote:> > On Sep 29, 2009, at 2:03 AM, Bernd Nies wrote: > >> Hi, >> >> We have a Sun Storage 7410 with the latest release (which is based upon >> opensolaris). The system uses a hybrid storage pool (23 1TB SATA disks in >> RAIDZ2 and 1 18GB SSD as log device). The ZFS volumes are exported with >> NFSv3 over TCP. NFS mount options are: >> >> rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio >> >> We compare that system with our Netapp FAS 3140 and notice a high >> performance decrease when multiple hosts write many small files in parrallel >> (e.g. CVS checkout). >> >> Doing that on one single host, the write speed is quite similar on both >> systems: >> >> Netapp FAS 3140: >> ? bernd at linuxhost:~/tmp> time cvs -Q checkout myBigProject >> ? real ? ?0m32.914s >> ? user ? ?0m1.568s >> ? sys ? ? 0m3.060s >> >> Sun Storage 7410: >> ? bernd at linuxhost:/share/nightlybuild/tmp> time cvs -Q checkout >> myBigProject >> ? real ? ?0m34.049s >> ? user ? ?0m1.592s >> ? sys ? ? 0m3.184s >> >> Doing the same operation on 5 different hosts on the same NFS share in >> different directories we notice a performance decrease which is proportional >> to the number of writing hosts (5x slower) while the same operation on >> Netapp FAS 3140 is less than 2x slower: >> >> Netapp FAS 3140: >> ? bernd at linuxhost:~/tmp/1> time cvs -Q checkout myBigProject >> ? real ? ?0m58.120s >> ? user ? ?0m1.452s >> ? sys ? ? 0m2.976s >> >> Sun Storage 7410: >> ? bernd at linuxhost:/share/nightlybuild/tmp/1> time cvs -Q checkout >> myBigProject >> ? real ? ?4m32.747s >> ? user ? ?0m2.296s >> ? sys ? ? 0m4.224s >> >> Often we run into timeouts (CVS timeout is set to 60 minutes) when >> building software during a nightly build process which makes this storage >> unusable because the NFS writes are slowed down drastically. This happens >> also when we run VMware machines on an ESX server on a NFS pool and Oracle >> databases on NFS. Netapp and Oracle recommend using NFS as central storage >> but we wanted a less expensive system because it is used only for >> development and testing and not highly critical production data. But the >> performance slowdown when more than one writing NFS client is involved is >> too bad. >> >> What might here the bottleneck? Any ideas? The zfs log device? Are there >> more than one zfs log device required for parallel performance? As many as >> NFS clients? > > bingo! ?One should suffice. > > BTW, not fair comparing a machine with an NVRAM cache to one > without... add an SSD for the log to even things out. > ?-- richardNot exactly true, look below at his pool configuration....>> nfsserver# zpool status >> ?pool: pool-0 >> state: ONLINE >> scrub: resilver completed after 0h0m with 0 errors on Wed Sep 23 04:27:21 >> 2009 >> config: >> >> ? ? ? NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATE ? ? READ WRITE >> CKSUM >> ? ? ? pool-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014ED4D01d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4EC09d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4EE46d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4F50Ed0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F4FB64d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F50A7Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F50F57d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52A59d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52D83d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52E0Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F52F9Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? raidz2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? ? ? c3t5000C50014F54EB1d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?254K resilvered >> ? ? ? ? ? c3t5000C50014F54FC9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F512E3d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F515C9d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F549EAd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F553EBd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?262K resilvered >> ? ? ? ? ? c3t5000C50014F5072Cd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?279K resilvered >> ? ? ? ? ? c3t5000C50014F5192Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?4.60M resilvered >> ? ? ? ? ? c3t5000C50014F5494Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?258K resilvered >> ? ? ? ? ? c3t5000C50014F5500Bd0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?264K resilvered >> ? ? ? ? ? c3t5000C50014F51865d0 ? ? ? ? ? ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 ?248K resilvered >> ? ? ? logs >> ? ? ? ? c3tATASTECZEUSIOPS018GBYTESSTM0000D905Cd0 ?ONLINE ? ? ? 0 ? ? 0 >> ? 0 >> ? ? ? spares >> ? ? ? ? c3t5000C50014F53925d0 ? ? ? ? ? ? ? ? ? ? ?AVAIL >> >> errors: No known data errorsIt appears he has a: http://www.stec-inc.com/product/zeusiops.php Which should be more then capable of providing the IOPS needed. Was that pool rebuilding during the tests or did that happen afterwards? -Ross
Hi, The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C) as ZFS log device. NFS writes from only one host are not the problem. Even with may small files it is almost as fast as a Netapp. Problem arises when doing the same parallel from n hosts. E.g. the same write from 10 hosts lasts 10 times longer. On the Netapp the same from 10 hosts takes only 2-3 times longer. Network bandwith is also not the problem here. Bye Bernd -- This message posted from opensolaris.org
On Tue, 29 Sep 2009, Bernd Nies wrote:> > NFS writes from only one host are not the problem. Even with may > small files it is almost as fast as a Netapp. Problem arises when > doing the same parallel from n hosts. E.g. the same write from 10 > hosts lasts 10 times longer. On the Netapp the same from 10 hosts > takes only 2-3 times longer. Network bandwith is also not the > problem here.Striping across two large raidz2s is not ideal for multi-user use. You are getting the equivalent of two disks worth of IOPS, which does not go very far. More smaller raidz vdevs or mirror vdevs would be better. Also, make sure that you have plenty of RAM installed. What disk configuration (number of disks, and RAID topology) is the NetApp using? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> Striping across two large raidz2s is not ideal for multi-user use. You > are getting the equivalent of two disks worth of IOPS, which does not > go very far. More smaller raidz vdevs or mirror vdevs would be > better. Also, make sure that you have plenty of RAM installed. >For small files I would definitely go mirrored.> What disk configuration (number of disks, and RAID topology) is the > NetApp using? >On NetApp you only can choose between RAID-DP and RAID-DP :-) With mirroring you will certainly loose space-wise against NetApp, but if your data compresses well, you will still end up with more space available. Our 7410 system currently compresses with a CPU utilisation of around 3% for compression. This while using gzip-2 and getting a compression ratio of 1.96. So far, I''m very happy with the system.
On Sep 29, 2009, at 7:59 AM, Bernd Nies wrote:> Hi, > > The system already has a SSD (ATASTECZeusIOPS018GBytesSTM0000D905C) > as ZFS log device.I apologize, I should know better than to answer before the first cup of coffee :-P> NFS writes from only one host are not the problem. Even with may > small files it is almost as fast as a Netapp. Problem arises when > doing the same parallel from n hosts. E.g. the same write from 10 > hosts lasts 10 times longer. On the Netapp the same from 10 hosts > takes only 2-3 times longer. Network bandwith is also not the > problem here.We would need more info before we could proceed. Can you collect some iostat data? iostat -zxnT d 1 The "raidz acts like a single disk" for IOPS is felt on small, random read workloads. Random write workloads perform better. -- richard
Hi Bob,> Striping across two large raidz2s is not ideal for > multi-user use. > You are getting the equivalent of two disks worth of > IOPS, which does > not go very far. More smaller raidz vdevs or mirror > vdevs would be > better. Also, make sure that you have plenty of RAM > installed.This is new. I thought the RAID level is responsible for overall write/read performance, no matter how many hosts write through NFS. The NFS layer should care about this. The filesystem or RAID itself doesn''t know how many hosts are writing. Or am I wrong? I followed this recommendation for choosing the RAID level because we wanted a system with high capacity and high disk fault tolerance: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl The Storage 7410 came with 16 GB RAM installed.> What disk configuration (number of disks, and RAID > topology) is the > NetApp using?The Netapp has a double parity RAID with a raid group size of 16. There are 14 disks in 3 shelves, connected with 4 Gb/s fibre channel to the head. I thought double parity RAID (Netapp calls it RAID-DP) is something similar to RAIDZ2. Best regards, Bernd -- This message posted from opensolaris.org
Hi,
Just for closing this topic. Two issues have been found which caused the slow
write performance on our Sun Storage 7410 with RAIDZ2:
(1) A Opensolaris bug when NFS mount option is set to wsize=32768. Reducing to
wsize=16384 resulted in a performance gain of about factor 2.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887
(2) The Sun Storage 7410 (software release 2009.09.01.0.0,1-1.2) was configured
as an LDAP client for mapping Unix UID/GID to Windows names/groups. At every
file access the filer asked the LDAP server and resolved the ownership of the
file. This also happened during NDMP backups and caused a high load on the LDAP
server. Seems that this release has a non-working name service cache daemon or
that the cache size is too small. We have about 500 users and 100 groups.
The LDAP replica was a Sun directory server 5.2p5 on a rather slow SunFire V240
with Solaris 9. After migrating the LDAP server to a fast machine (Solaris 10
x86 on VMware ESX 4i) the NFS I/O rate was much better and after disabling LDAP
client at all the I/O rate is now about 16x better when 10 Linux hosts are
untarring the Linux kernel source to the same NFS share.
Actions:
- time tar -xf ../linux-2.6.32-rc1.tar
- time rm -rf linux-2.6.32-rc1
NFS mount options: wsize=16384
gzip: ZFS filesystem on the fly compression
OpenStorage 7410 | tar -xf | rm -rf
--------------------+-------------+------------
LDAP on, 1 client | 3m 50.809s | 0m 16.395s
10 clients | 19m 59.453s | 69m 12.107s
--------------------+-------------+------------
LDAP off, 1 client | 1m 15.340s | 0m 14.784s
10 clients | 3m 29.785s | 4m 51.606s
--------------------+-------------+------------
LDAP off, gzip 1 cl | 2m 13.713s | 0m 14.936s
10 cl | 3m 47.773s | 7m 37.606s
In the meantime the system performs well.
Best regards,
Bernd
--
This message posted from opensolaris.org