HI, The question is a ZFS performance question in reguards to SAN traffic. We are trying to benchmark ZFS vx VxFS file systems and I get the following performance results. Test Setup: Solaris 10: 11/06 Dual port Qlogic HBA with SFCSM (for ZFS) and DMP (of VxFS) Sun Fire v490 server LSI Raid 3994 on backend ZFS Record Size: 128KB (default) VxFS Block Size: 8KB(default) The only thing different in setup for the ZFS vs. VxFS tests is the file system and an array support module (ASM) module was installed for the RAID on the VxFS test case. Test Case: Run ''iostat'', then write a 1GB file using ''mkfile 1g testfile'' and then run iostat again. ZFS Test Results: The KB written per second averaged around 250KB. VxFS Test Results: The KB written per second averaged around 70KB. When I fixed the ZFS record size to 8KB the KB written per second averaged 110KB. My questions by be too general to answer here but I thought I would try. Why does ZFS write more traffic to disk then VxFS? Why does ZFS write more traffic to disk when the Record Size is variable instead of fixed in size? Thanks, ljs This message posted from opensolaris.org
The 250KB below was confusing to one reader. What I mean is that over the interval of the file write, it transfers 250KB of traffic. man iostat and you can see that it is correct. 250KB per second is not the bandwidth. I also understand the ''mkfile'' is not an acceptable perrformance benchmark but iozone also does the same thing This message posted from opensolaris.org
Brendan Gregg - Sun Microsystems
2007-Nov-29 22:05 UTC
[zfs-discuss] ZFS write time performance question
G''Day Luke, On Thu, Nov 29, 2007 at 08:18:09AM -0800, Luke Schwab wrote:> HI, > > The question is a ZFS performance question in reguards to SAN traffic. > > We are trying to benchmark ZFS vx VxFS file systems and I get the following performance results. > > Test Setup: > Solaris 10: 11/06 > Dual port Qlogic HBA with SFCSM (for ZFS) and DMP (of VxFS) > Sun Fire v490 server > LSI Raid 3994 on backend > ZFS Record Size: 128KB (default) > VxFS Block Size: 8KB(default) > > The only thing different in setup for the ZFS vs. VxFS tests is the file system and an array support module (ASM) module was installed for the RAID on the VxFS test case. > > Test Case: Run ''iostat'', then write a 1GB file using ''mkfile 1g testfile'' and then run iostat again.It will probably be better to run iostat during the write, rather than looking at iostat''s summary-since-boot before and after (which would be better served from the raw kstats anyway). Eg: iostat -xnmpz 5 ZFS comes with its own iostat version: zpool iostat -v pool> ZFS Test Results: The KB written per second averaged around 250KB. > VxFS Test Results: The KB written per second averaged around 70KB.250 Kbytes/sec? This sounds really wrong for a write benchmark - single disks these days can deliver between 10 Mbytes/sec and 75 Mbytes/sec for single stream write. At least use a much larger filesize than 1 Gbyte (much of which could fit in a RAM based file system cache if your system had multiple Gbytes of RAM). The Kbytes written per second value you are using isn''t at the application layer, rather that which made it all the way to disk. It might be crude, but running "ptime mkfile ..." might give a better idea of the application throughput - as it will show the real time taken to create a 1 Gbyte file (but then you might end up comparing what different file systems consider sync''d, rather than throughput)...> When I fixed the ZFS record size to 8KB the KB written per second averaged 110KB. > > My questions by be too general to answer here but I thought I would try. > > Why does ZFS write more traffic to disk then VxFS? Why does ZFS write more traffic to disk when the Record Size is variable instead of fixed in size?I''d recommend running filebench for filesystem benchmarks, and see what the results are: http://www.solarisinternals.com/wiki/index.php/FileBench Filebench is able to purge the ZFS cache (export/import) between runs, and can be customised to match real world workloads. It should improve the accuracy of the numbers. I''m expecting filebench to become *the* standard tool for filesystem benchmarks. Brendan -- Brendan [CA, USA]
William D. Hathaway
2007-Nov-30 15:17 UTC
[zfs-discuss] ZFS write time performance question
In addition to Brendan''s advice about benchmarking, it would be a good idea to use the newer Solaris release (Solaris 10 08/07), which has a lot of ZFS improvements (performance and functional). This message posted from opensolaris.org
przemolicc at poczta.fm
2007-Dec-03 10:28 UTC
[zfs-discuss] ZFS write time performance question
On Thu, Nov 29, 2007 at 02:05:13PM -0800, Brendan Gregg - Sun Microsystems wrote:> > I''d recommend running filebench for filesystem benchmarks, and see what > the results are: > > http://www.solarisinternals.com/wiki/index.php/FileBench > > Filebench is able to purge the ZFS cache (export/import) between runs, > and can be customised to match real world workloads. It should improve > the accuracy of the numbers. I''m expecting filebench to become *the* > standard tool for filesystem benchmarks.And some results (for OLTP workload): http://przemol.blogspot.com/2007/08/zfs-vs-vxfs-vs-ufs-on-scsi-array.html Regards przemol -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- A co by bylo, gdybys to TY rzadzil? Kliknij >>> http://link.interia.pl/f1c91
> And some results (for OLTP workload): > > http://przemol.blogspot.com/2007/08/zfs-vs-vxfs-vs-ufs > -on-scsi-array.htmlWhile I was initially hardly surprised that ZFS offered only 11% - 15% of the throughput of UFS or VxFS, a quick glance at Filebench''s OLTP workload seems to indicate that it''s completely random-access in nature without any of the sequential-scan activity that can *really* give ZFS fits. The fact that you were using an underlying hardware RAID really shouldn''t have affected these relationships, given that it was configured as RAID-10. It would be interesting to see your test results reconciled with a detailed description of the tests generated by the Kernel Performance Engineering group which are touted as indicating that ZFS performs comparably with other file systems in database use: I actually don''t find that too hard to believe (without having put all that much thought into it) when it comes to straight OLTP without queries that might result in sequential scans, but your observations seem to suggest otherwise (and the little that I have been able to infer about the methodology used to generate some of the rosy-looking ZFS performance numbers does not inspire confidence in the real-world applicability of those internally-generated results). - bill This message posted from opensolaris.org
This might have been affected by the cache flush issue -- if the 3310 flushes its NVRAM cache to disk on SYNCHRONIZE CACHE commands, then ZFS is penalizing itself. I don''t know whether the 3310 firmware has been updated to support the SYNC_NV bit. It wasn''t obvious on Sun''s site where to download the latest firmware. A quick glance through the OpenSolaris code indicates that ZFS & the sd driver have been updated to support this bit, but I didn''t track down which release first introduced this functionality. This message posted from opensolaris.org
what firmware revision are you at? This message posted from opensolaris.org
przemolicc at poczta.fm
2007-Dec-06 10:22 UTC
[zfs-discuss] ZFS write time performance question
On Wed, Dec 05, 2007 at 09:02:43PM -0800, Tim Cook wrote:> what firmware revision are you at?Revision: 415G Regards przemol -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- A co by bylo, gdybys to TY rzadzil? Kliknij >>> http://link.interia.pl/f1c91
On Dec 5, 2007, at 8:38 PM, Anton B. Rang wrote:> This might have been affected by the cache flush issue -- if the > 3310 flushes its NVRAM cache to disk on SYNCHRONIZE CACHE commands, > then ZFS is penalizing itself. I don''t know whether the 3310 > firmware has been updated to support the SYNC_NV bit. It wasn''t > obvious on Sun''s site where to download the latest firmware.Yeah that would be my guess on the huge disparity. I actually don''t know of any storage device that actually supports SYNC_NV. If someone knows of any, i''d love to know. przemol, did you set the recordsize to 8KB? What are the server''s specs? (memory, CPU) Which version of FileBench and which version of the oltp.f workload did you use?> > A quick glance through the OpenSolaris code indicates that ZFS & > the sd driver have been updated to support this bit, but I didn''t > track down which release first introduced this functionality.Yep, that would be: 6462690 sd driver should set SYNC_NV bit when issuing SYNCHRONIZE CACHE to SBC-2 devices http://bugs.opensolaris.org/view_bug.do?bug_id=6462690 It was putback in snv_74. The PSARC case is 2007/053 (though i see its not open which doesn''t do much good for externals...). In any event, if the 3310 doesn''t support SYNC_NV (which i would guess it doesn''t) then it may require manually editing sd.conf to treat the flush commands as no-ops. eric
przemolicc at poczta.fm
2007-Dec-07 09:15 UTC
[zfs-discuss] ZFS write time performance question
On Thu, Dec 06, 2007 at 11:40:13AM -0800, eric kustarz wrote:> > [...] > > przemol, did you set the recordsize to 8KB?Yes. It is mentioned in the legend on the right side of the chart.> What are the server''s specs? (memory, CPU)Memory: 24GB CPU: 8 x UltraSPARC-III+ 900 MHz> Which version of FileBench and which version of the oltp.f workload > did you use?kania:/export/home/przemol>pkginfo -l filebench PKGINST: filebench NAME: FileBench CATEGORY: application ARCH: sparc,i386 VERSION: 20 Jul 05 BASEDIR: /opt VENDOR: Richard Mc Dougall DESC: FileBench PSTAMP: 1.64.5_s10_x86_sparc_PRERELEASE INSTDATE: Jun 27 2007 10:14 EMAIL: Richard.McDougall at Sun.COM STATUS: completely installed FILES: 266 installed pathnames 6 linked files 28 directories 67 executables 2 setuid/setgid executables 33368 blocks used (approx) I used oltp.f workload from the above package but, as mentioned in the blog, with changed parameters like $filesize, etc.. Regards przemol -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- Internauci zamawiaja cuda u Donalda! Sprawdz >>> http://link.interia.pl/f1ca5