Ran 3 test using mkfile to create a 6GB on a ufs and ZFS file system. command ran mkfile -v 6gb /ufs/tmpfile Test 1 UFS mounted LUN (2m2.373s) Test 2 UFS mounted LUN with directio option (5m31.802s) Test 3 ZFS LUN (Single LUN in a pool) (3m13.126s) Sunfire V120 1 Qlogic 2340 Solaris 10 06/06 Attached to Hitachi 9990 (USP) LUNS are Open L''s at 33.9 GB, plenty of cache on the HDS box disk are in a Raid5 . New to ZFS so am I missing something the standard UFS write bested ZFS by a minute. ZFS iostat showed about 50 MB a sec. This message posted from opensolaris.org
On 10/30/06, Jay Grogan <jayg4 at aol.com> wrote:> Ran 3 test using mkfile to create a 6GB on a ufs and ZFS file system. > command ran mkfile -v 6gb /ufs/tmpfile > > Test 1 UFS mounted LUN (2m2.373s) > Test 2 UFS mounted LUN with directio option (5m31.802s) > Test 3 ZFS LUN (Single LUN in a pool) (3m13.126s) > > Sunfire V120 > 1 Qlogic 2340 > Solaris 10 06/06 > > Attached to Hitachi 9990 (USP) LUNS are Open L''s at 33.9 GB, plenty of cache on the HDS box disk are in a Raid5 . > > New to ZFS so am I missing something the standard UFS write bested ZFS by a minute. ZFS iostat showed about 50 MB a sec.Do you find this surprising? Why? A ZFS pool has additional overhead relative to a simple filesystem -- the metadata is duplicated, and metadata and data blocks are checksummed. ZFS gives higher reliability, and better integration between the levels, but it''s *not* designed for maximizing disk performance without regard to reliability. Also, stacking it on top of an existing RAID setup is kinda missing the entire point! -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Oct 30, 2006, at 10:45 PM, David Dyer-Bennet wrote:> Also, stacking it on top of an existing RAID setup is kinda missing > the entire point!Everyone keeps saying this, but I don''t think it is missing the point at all. Checksumming and all the other goodies still work fine and you can run a ZFS mirror across 2 or more raid devices for ultimate in reliability. My Dual RAID-6 with large ECC battery backed cache device mirrors will be much more reliable than your RAID-Z and probably perform better, and I still get the ZFS goodness. I can lose one whole RAID device (all the disks) and up to 2 of the disks on the second RAID device, all att he same time, and still be OK and fully recoverable and still operating. (ok, my second raid is not yet installed, so right now my ZFS''ed single RAID-6 is not as reliable as I would like, but the second half, ie, second RAID-6 will be installed before XMas) Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061030/5a55072b/attachment.bin>
To answer your question "Yes I did expect the same or better performance than standard UFS" based on all the hype and to quote Sun "Blazing performance ZFS is based on a transactional object model that removes most of the traditional constraints on the order of issuing I/Os, which results in huge performance gains. We have SLA''s with the disk vendor so stacking ZFS on top of RAID is a better option for me. Seen some tweaking that can be done with the I/O Scheduling and see where that gets. This message posted from opensolaris.org
Jay Grogan wrote:> To answer your question "Yes I did expect the same or better performance than standard UFS" based on all the hype and to quote Sun "Blazing performance > ZFS is based on a transactional object model that removes most of the traditional constraints on the order of issuing I/Os, which results in huge performance gains. We have SLA''s with the disk vendor so stacking ZFS on top of RAID is a better option for me. Seen some tweaking that can be done with the I/O Scheduling and see where that gets.mkfile is a lousy benchmark. I''d suggest that you look at something more similar to the intended workload. Alternatively, look at micro- benchmarks, such as the open-source filebench. -- richard
Hello Jay, Tuesday, October 31, 2006, 3:31:54 AM, you wrote: JG> Ran 3 test using mkfile to create a 6GB on a ufs and ZFS file system. JG> command ran mkfile -v 6gb /ufs/tmpfile JG> Test 1 UFS mounted LUN (2m2.373s) JG> Test 2 UFS mounted LUN with directio option (5m31.802s) JG> Test 3 ZFS LUN (Single LUN in a pool) (3m13.126s) JG> Sunfire V120 JG> 1 Qlogic 2340 JG> Solaris 10 06/06 JG> Attached to Hitachi 9990 (USP) LUNS are Open L''s at 33.9 GB, JG> plenty of cache on the HDS box disk are in a Raid5 . JG> New to ZFS so am I missing something the standard UFS write JG> bested ZFS by a minute. ZFS iostat showed about 50 MB a sec. There''s an open BUG to address performance while sequential writing. I belive it''s not solved yet but you may want to try with latest nevada and see if there''s a difference. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert,> I belive it''s not solved yet but you may want to try with > latest nevada and see if there''s a difference.It''s fixed in the upcoming Solaris 10 U3 and also in Solaris Express post build 47 I think. - Luke
Thanks Robert, I was hoping something like that hard turned up allot of what I will need to use ZFS for will be sequential writes at this time. This message posted from opensolaris.org
Chad Leigh -- Shire.Net LLC
2006-Oct-31 20:47 UTC
[zfs-discuss] Re: ZFS Performance Question
On Oct 31, 2006, at 11:09 AM, Jay Grogan wrote:> Thanks Robert, I was hoping something like that hard turned up > allot of what I will need to use ZFS for will be sequential writes > at this time.I don''t know what it is worth, but I was using iozone <http:// www.iozone.org/> on my ZFS on top of Areca RAID volumes as well on ufs on a similar volume and it showed, for many sorts of things, better performance under ZFS. I am not an expert on file systems and disk performance so I cannot say that there are not faults in its methodology, but it is interesting to run and look at. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061031/d0ee70df/attachment.bin>
Hello Jay, Tuesday, October 31, 2006, 7:09:12 PM, you wrote: JG> Thanks Robert, I was hoping something like that hard turned up JG> allot of what I will need to use ZFS for will be sequential writes at this time. JG> Even then I would try first to test with more real load on ZFS as it can turn out that ZFS performs better anyway. Despite problems with large sequential writings I find ZFS to perform better in many more complex scenarios. So in real life ZFS could actually perform better in your environment than in that simple test. ps. try to echo ''txg_time/W 1''|mdb -kw and repeat test - it could help -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Luke, Tuesday, October 31, 2006, 6:09:23 PM, you wrote: LL> Robert, LL>>> I belive it''s not solved yet but you may want to try with >> latest nevada and see if there''s a difference.LL> It''s fixed in the upcoming Solaris 10 U3 and also in Solaris Express LL> post build 47 I think. Almost definitely not true. I did some simple test today with U3 beta on thumper and still can observe "jumping" writes with sequential ''dd''. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert, On 10/31/06 3:10 PM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:> Even then I would try first to test with more real load on ZFS as it > can turn out that ZFS performs better anyway. Despite problems with > large sequential writings I find ZFS to perform better in many more > complex scenarios. So in real life ZFS could actually perform better > in your environment than in that simple test.I''d withhold all judgement about sequential I/O until you''ve tested the fixes in snv_50+ We''ve seen more than 2,000 MB/s of sequential transfer rate on 36 disks. - Luke
Robert, On 10/31/06 3:12 PM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:> Almost definitely not true. I did some simple test today with U3 beta > on thumper and still can observe "jumping" writes with sequential > ''dd''.We crossed posts. There are some firmware issues with the Hitachi disks that cause write problems like what you describe. What generation Thumper is this? - Luke
Hello Luke, Wednesday, November 1, 2006, 12:13:28 AM, you wrote: LL> Robert, LL> On 10/31/06 3:10 PM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:>> Even then I would try first to test with more real load on ZFS as it >> can turn out that ZFS performs better anyway. Despite problems with >> large sequential writings I find ZFS to perform better in many more >> complex scenarios. So in real life ZFS could actually perform better >> in your environment than in that simple test.LL> I''d withhold all judgement about sequential I/O until you''ve tested the LL> fixes in snv_50+ LL> We''ve seen more than 2,000 MB/s of sequential transfer rate on 36 disks. Right now with S10U3 beta with over 40 disks I can get only about 1.6GB/s peak. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert, On 10/31/06 3:55 PM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:> Right now with S10U3 beta with over 40 disks I can get only about > 1.6GB/s peak.That''s decent - is that the number reported by "zpool iostat"? In that case then I think 1GB = 1024^4, my GB measurements are roughly "billion bytes", so there''s some of the diff. Also, we use four threads, one per CPU for I/O in our DBMS so we get closer to saturation on RAID10. - Luke
Hello Luke, Wednesday, November 1, 2006, 12:59:49 AM, you wrote: LL> Robert, LL> On 10/31/06 3:55 PM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:>> Right now with S10U3 beta with over 40 disks I can get only about >> 1.6GB/s peak.LL> That''s decent - is that the number reported by "zpool iostat"? In that case LL> then I think 1GB = 1024^4, my GB measurements are roughly "billion bytes", LL> so there''s some of the diff. Also, we use four threads, one per CPU for I/O LL> in our DBMS so we get closer to saturation on RAID10. yep, zpool iostat reports about 1.6GB (peak), and 0.6-1,2GB on avarage. ps. I should have put only in a quotes :) -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Jay Grogan wrote:> Ran 3 test using mkfile to create a 6GB on a ufs and ZFS file system. > command ran mkfile -v 6gb /ufs/tmpfile > > Test 1 UFS mounted LUN (2m2.373s) > Test 2 UFS mounted LUN with directio option (5m31.802s) > Test 3 ZFS LUN (Single LUN in a pool) (3m13.126s) > > Sunfire V120 > 1 Qlogic 2340 > Solaris 10 06/06 > > Attached to Hitachi 9990 (USP) LUNS are Open L''s at 33.9 GB, plenty of cache on the HDS box disk are in a Raid5 . > > New to ZFS so am I missing something the standard UFS write bested ZFS by a minute. ZFS iostat showed about 50 MB a sec. >Hmm, something doesn''t seem right. From my previous experiments back in the day, ZFS was slightly faster than UFS: http://blogs.sun.com/erickustarz/entry/fs_perf_102_filesystem_bw And i re-ran this on 10/31 nevada non-debug bits: ZFS: # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' real 1:45.8 user 0.0 sys 16.5 # UFS write cache disabled: # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' real 1:57.4 user 0.9 sys 39.3 # UFS write cache enabled: # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' real 1:57.1 user 0.9 sys 39.4 # The big difference of course being our hardware. I''m using a V210 (2 way sparc) with a single disk - no NVRAM. So what is a "LUN" in your setup? and there''s NVRAM in the HDS box? What does your iostat output look like when comparing UFS vs. ZFS? I''m wondering if we''re hitting the problem where we send the wrong flush write cache command down and we''re actually flushing the NVRAM every txg, when the storage should be smart enough to ignore the flush. eric
Luke Lonergan writes: > Robert, > > > I belive it''s not solved yet but you may want to try with > > latest nevada and see if there''s a difference. > > It''s fixed in the upcoming Solaris 10 U3 and also in Solaris Express > post build 47 I think. > > - Luke > This one is not yet fixed : 6415647 Sequential writing is jumping -r > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
How much memory in the V210 ? UFS will recycle it''s own pages while creating files that are big. ZFS working against a large heap of free memory will cache the data (why not?). The problem is that ZFS does not know when to stop. During the subsequent memory/cache reclaim, ZFS is potentially not very efficient at keeping up with the file creation process (just a hypothesis here). See: 6488341 ZFS should avoiding growing the ARC into trouble (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6488341) <disclaimer: a common mistake is to view the bug du jour everywhere, So it''s just a lead for investigations.> It''s interesting that zpool iostat reports 50MB/s but 6GB in 3.13 sec is 31MB/sec. Is the recordsize tuned ? is compression on ? -r eric kustarz writes: > Jay Grogan wrote: > > Ran 3 test using mkfile to create a 6GB on a ufs and ZFS file system. > > command ran mkfile -v 6gb /ufs/tmpfile > > > > Test 1 UFS mounted LUN (2m2.373s) > > Test 2 UFS mounted LUN with directio option (5m31.802s) > > Test 3 ZFS LUN (Single LUN in a pool) (3m13.126s) > > > > Sunfire V120 > > 1 Qlogic 2340 > > Solaris 10 06/06 > > > > Attached to Hitachi 9990 (USP) LUNS are Open L''s at 33.9 GB, plenty of cache on the HDS box disk are in a Raid5 . > > > > New to ZFS so am I missing something the standard UFS write bested ZFS by a minute. ZFS iostat showed about 50 MB a sec. > > > > Hmm, something doesn''t seem right. From my previous experiments back in > the day, ZFS was slightly faster than UFS: > http://blogs.sun.com/erickustarz/entry/fs_perf_102_filesystem_bw > > And i re-ran this on 10/31 nevada non-debug bits: > > ZFS: > # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' > > real 1:45.8 > user 0.0 > sys 16.5 > # > > UFS write cache disabled: > # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' > > real 1:57.4 > user 0.9 > sys 39.3 > # > > UFS write cache enabled: > # /bin/time sh -c ''lockfs -f .; mkfile 6g 6g.txt; lockfs -f .'' > > real 1:57.1 > user 0.9 > sys 39.4 > # > > The big difference of course being our hardware. I''m using a V210 (2 > way sparc) with a single disk - no NVRAM. > > So what is a "LUN" in your setup? and there''s NVRAM in the HDS box? > > What does your iostat output look like when comparing UFS vs. ZFS? I''m > wondering if we''re hitting the problem where we send the wrong flush > write cache command down and we''re actually flushing the NVRAM every > txg, when the storage should be smart enough to ignore the flush. > > eric > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
The V120 has 4GB of RAM , on the HDS side we are in a RAID 5 on the LUN and not shairing any ports on the MCdata, but with so much cache we aren''t close to taxing the disk. You mentioned the 50MB on the throughput and that''s something we''ve been wondering around here as to what the average is for this and if 50 down one pipe is pretty normal given the 9990 we are hooked to. I did not turn compression on just created a pool and added the LUN in. This is my first round with ZFS so no tunning has been done. I did run one test on two open L LUNS (33.9GB) by creating a mirror and throughput dropped to 25MB on zpool IOSTAT. The NVRAM question got my SAN guy looking and I''ll post up his findings. I''m going to hookup up a V490 soon with MPXIO so I''''l see how the numbers run there. This message posted from opensolaris.org
Roch, On 11/2/06 12:51 AM, "Roch - PAE" <Roch.Bourbonnais at Sun.COM> wrote:> This one is not yet fixed : > 6415647 Sequential writing is jumpingYep - I mistook this one for another problem with drive firmware on pre-revenue units. Since Robert has a customer release X4500 it doesn''t have the firmware problem. - Luke
Jay Grogan wrote:> The V120 has 4GB of RAM , on the HDS side we are in a RAID 5 on the LUN and not shairing any ports on the MCdata, but with so much cache we aren''t close to taxing the disk.Are you sure? At some point data has to get flushed from the cache to the drives themselves. In most of the arrays I looked at, granted this was a while ago, the cache could only stay dirty for so long before it was flushed no matter how much cache was in use. Also, if your workload looks sequential to the HDS box it will just push your data right past the cache to the drives themselves.