v
2010-Jul-20 10:12 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk''s ipos. Regards Victor -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-Jul-20 10:46 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
----- Original Message -----> Hi, > for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to > one physical disk iops, since raidz1 is like raid5 , so is raid5 has > same performance like raidz1? ie. random iops equal to one physical > disk''s ipos.Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS doing checksumming, having the ZIL etc, but then, trad raid5 won''t have the safety offered by ZFS Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Darren J Moffat
2010-Jul-20 11:38 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On 20/07/2010 11:46, Roy Sigurd Karlsbakk wrote:> ----- Original Message ----- >> Hi, >> for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to >> one physical disk iops, since raidz1 is like raid5 , so is raid5 has >> same performance like raidz1? ie. random iops equal to one physical >> disk''s ipos. > > Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS doing checksumming, having the ZIL etc, but then, trad raid5 won''t have the safety offered by ZFSThat depends on wither or not you are CPU bound, if you aren''t CPU bound then the checksuming may not mater. Depending on what RAID5 hardware you are using (and how much available NVRAM it has) the ZFS RAIDZ could be faster. See Jeff Bonwick''s blog posting describing RAID-Z http://blogs.sun.com/bonwick/entry/raid_z And Adam Leventhal''s blog posting describing RAID-Z3 http://blogs.sun.com/ahl/entry/triple_parity_raid_z The current code base supports raidz1, raidz2, raidz3 (triple parity) -- Darren J Moffat
Ross Walker
2010-Jul-20 12:26 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 20, 2010, at 6:12 AM, v <victor_zhang at hotmail.com> wrote:> Hi, > for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk''s ipos.On reads, no, any part of the stripe width can be read without reading the whole stripe width, giving performance equal to raid0 of non-parity disks. On writes it could be worse then raidz1 depending on whether whole stripe widths are being written (same performance) or partial stripe widths are being written (worse performance). If it''s a partial stripe width then the remaining data needs to be read off disk which doubles the IOs. -Ross
Roy Sigurd Karlsbakk
2010-Jul-20 13:07 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
----- Original Message -----> On Jul 20, 2010, at 6:12 AM, v <victor_zhang at hotmail.com> wrote: > > > Hi, > > for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to > > one physical disk iops, since raidz1 is like raid5 , so is raid5 has > > same performance like raidz1? ie. random iops equal to one physical > > disk''s ipos. > > On reads, no, any part of the stripe width can be read without reading > the whole stripe width, giving performance equal to raid0 of > non-parity disks.Are you sure this is true? I know it is, in theory, but some testing with bonnie++ showed me I didn''t get so large a gain. Perhaps my tests were done wrong? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Richard Elling
2010-Jul-20 14:44 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 20, 2010, at 3:46 AM, Roy Sigurd Karlsbakk wrote:> ----- Original Message ----- >> Hi, >> for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to >> one physical disk iops, since raidz1 is like raid5 , so is raid5 has >> same performance like raidz1? ie. random iops equal to one physical >> disk''s ipos. > > Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS doing checksumming, having the ZIL etc, but then, trad raid5 won''t have the safety offered by ZFSDisagree. ZIL has nothing to do with RAIDness. Traditional RAID-5 suffers from a read-modify-write sequence if the I/O is not perfectly matched to the stripe width -- a 3x latency hit. In raidz, the writes are always full stripe, so there is only a 1x latency hit. OTOH, for reads, some RAID-5 implementations will read only a single portion of a stripe, if the I/O is small enough to fit. In this case, the small, random read performance can approach RAID-0. raidz will always read the full block, even though the full block might not be spread across all of the disks. ZFS does this to verify the checksum of the data. This is the classic tradeoff -- space, performance, dependability: pick two. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
Ulrich Graef
2010-Jul-20 14:56 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
There is a common misconception about the comparison between mirror and raidz. You get the same performance, when you use the same number of disks. But the resulting filesystem has a different sizre, therefore a comparison is not applicable. Example: you have 8 disks Compare a zpool with one raidz vdev with 8 disks with a zpool containing 4 mirrors of 2 disks each. Then the read IOs spread over 8 disks in each case. Therfore the number of IOs is comparable But you compared apples and oranges because the net size is 7 disks in the first case and 4 disks in the second case. A valid comparison would be a comparison of a zpool with one raidz vdev containing 5 disks with the mirrored zpool containing 4 mirrors of 2 disks each. Because then the size of the zpool is the same. Regards, Ulrich Roy Sigurd Karlsbakk wrote:> ----- Original Message ----- > >> On Jul 20, 2010, at 6:12 AM, v <victor_zhang at hotmail.com> wrote: >> >> >>> Hi, >>> for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to >>> one physical disk iops, since raidz1 is like raid5 , so is raid5 has >>> same performance like raidz1? ie. random iops equal to one physical >>> disk''s ipos. >>> >> On reads, no, any part of the stripe width can be read without reading >> the whole stripe width, giving performance equal to raid0 of >> non-parity disks. >> > > Are you sure this is true? I know it is, in theory, but some testing with bonnie++ showed me I didn''t get so large a gain. Perhaps my tests were done wrong? > > Vennlige hilsener / Best regards > > roy >> -- > Roy Sigurd Karlsbakk > (+47) 97542685 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Ulrich Graef / Senior SE / Hardware Presales / Phone: + 49 6103 752 359 ORACLE Deutschland B.V. & Co. KG / Amperestr. 6 / 63225 Langen http://www.oracle.com ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstr. 25, D-80992 Muenchen Registergericht: Amtsgericht Muenchen, HRA 95603 Komplementaerin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschaeftsfuehrer: Juergen Kunz, Marcel van de Molen, Alexander van der Ven -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100720/1833dd32/attachment.html>
Bob Friesenhahn
2010-Jul-20 18:45 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Tue, 20 Jul 2010, Roy Sigurd Karlsbakk wrote:> > Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS > because of ZFS doing checksumming, having the ZIL etc, but then, > trad raid5 won''t have the safety offered by ZFSThe biggest difference is almost surely that ZFS will always construct/read full filesystem blocks (default 128K). Traditional RAID-5 does not need to do that for small reads. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2010-Jul-21 14:40 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of v > > for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to > one physical disk iops, since raidz1 is like raid5 , so is raid5 has > same performance like raidz1? ie. random iops equal to one physical > disk''s ipos.I tested this extensively about 6 months ago. Please see http://www.nedharvey.com for more details. I disagree with the assumptions you''ve made above, and I''ll say this instead: Look at http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf Go down to the 2nd section, "Compared to a single disk" Look at "single-disk" and "raidz-5disks" and "raid5-5disks-hardware" You''ll see that both raidz and raid5 are significantly faster than a single disk in all types of operations. In all cases, raidz is approximately equal to, or significantly faster than hardware raid5. Furthermore, I later went on to test performance using nonvolatile devices (such as SSD) for ZIL dedicated log device, and in those situations, the performance of ZFS with dedicated log device beat hardware writeback caching easily. So put simply: ZFS raid is faster than the fastest hardware raid. Because ZFS has knowledge of the filesystem and blocks, while hardware raid only has knowledge of the blocks. So ZFS is able to be more intelligent in the techniques it uses for acceleration.
Robert Milkowski
2010-Jul-21 15:22 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On 21/07/2010 15:40, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of v >> >> for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to >> one physical disk iops, since raidz1 is like raid5 , so is raid5 has >> same performance like raidz1? ie. random iops equal to one physical >> disk''s ipos. >> > I tested this extensively about 6 months ago. Please see > http://www.nedharvey.com for more details. I disagree with the assumptions > you''ve made above, and I''ll say this instead: > > Look at > http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. > pdf > Go down to the 2nd section, "Compared to a single disk" > Look at "single-disk" and "raidz-5disks" and "raid5-5disks-hardware" > > You''ll see that both raidz and raid5 are significantly faster than a single > disk in all types of operations. In all cases, raidz is approximately equal > to, or significantly faster than hardware raid5. > >I had a quick look at your results a moment ago. The problem is that you used a server with 4GB of RAM + a raid card with a 256MB of cache. Then your filesize for iozone was set to 4GB - so random or not you probably had a relatively good cache hit ratio for random reads. And even then a random read from 8 threads gave you only about 40% more IOPS than for a RAID-Z made out of 5 disks than a single drive. The poor result for HW-R5 is surprising though but it might be that a stripe size was not matched to ZFS recordsize and iozone block size in this case. The issue with raid-z and random reads is that as cache hit ratio goes down to 0 the IOPS approaches IOPS of a single drive. For a little bit more information see http://blogs.sun.com/roch/entry/when_to_and_not_to -- Robert Milkowski http://milek.blogspot.com
Edward Ned Harvey
2010-Jul-22 02:25 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Robert Milkowski > > > I had a quick look at your results a moment ago. > The problem is that you used a server with 4GB of RAM + a raid card > with > a 256MB of cache. > Then your filesize for iozone was set to 4GB - so random or not you > probably had a relatively good cache hit ratio for random reads. AndLook again in the raw_results. I ran it with 4G, and also with 12G. There was no significant difference between the two, so I only compiled the 4G results into a spreadsheet PDF.> even then a random read from 8 threads gave you only about 40% more > IOPS > than for a RAID-Z made out of 5 disks than a single drive. The poor > result for HW-R5 is surprising though but it might be that a stripe > size > was not matched to ZFS recordsize and iozone block size in this case.I think what you''re saying is "With 5 disks performing well, you should expect 4x higher iops than a single disk," and "the measured result was only 40% higher, which is a poor result." I agree. I guess the 128k recordsize used in iozone is probably large enough that it frequently causes blocks to span disks? I don''t know.> The issue with raid-z and random reads is that as cache hit ratio goes > down to 0 the IOPS approaches IOPS of a single drive. For a little bit > more information see http://blogs.sun.com/roch/entry/when_to_and_not_toI don''t think that''s correct, unless you''re using a single thread. As long as multiple threads are issuing random reads on raidz, and those reads are small enough that each one is entirely written on a single disk, then you should be able to get n-1 disk operating simultaneously, to achieve (n-1)x performance of a single disk. Even if blocks are large enough to span disks, you should be able to get (n-1)x performance of a single disk for large sequential operations.
Robert Milkowski
2010-Jul-22 07:27 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On 22/07/2010 03:25, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Robert Milkowski >> >>> >> I had a quick look at your results a moment ago. >> The problem is that you used a server with 4GB of RAM + a raid card >> with >> a 256MB of cache. >> Then your filesize for iozone was set to 4GB - so random or not you >> probably had a relatively good cache hit ratio for random reads. And >> > Look again in the raw_results. I ran it with 4G, and also with 12G. There > was no significant difference between the two, so I only compiled the 4G > results into a spreadsheet PDF. > > >The only tests with 12GB file size in raw files are a mirror and a single disk configuration. There are no results for raid-z there.>> even then a random read from 8 threads gave you only about 40% more >> IOPS >> than for a RAID-Z made out of 5 disks than a single drive. The poor >> result for HW-R5 is surprising though but it might be that a stripe >> size >> was not matched to ZFS recordsize and iozone block size in this case. >> > I think what you''re saying is "With 5 disks performing well, you should > expect 4x higher iops than a single disk," and "the measured result was only > 40% higher, which is a poor result." > > I agree. I guess the 128k recordsize used in iozone is probably large > enough that it frequently causes blocks to span disks? I don''t know. > >Probably - but it would also depend on how you configured hw-r5 (mainly it''s stripe size). The other thing is that you might have had some bottleneck somewhere else as your results for N-way mirrors aren''t that good either.> >> The issue with raid-z and random reads is that as cache hit ratio goes >> down to 0 the IOPS approaches IOPS of a single drive. For a little bit >> more information see http://blogs.sun.com/roch/entry/when_to_and_not_to >> > I don''t think that''s correct, less you''re using a single thread. As long > as multiple threads are issuing random reads on raidz, and those reads are > small enough that each one is entirely written on a single disk, then you > should be able to get n-1 disk operating simultaneously, to achieve (n-1)x > performance of a single disk. > > Even if blocks are large enough to span disks, you should be able to get > (n-1)x performance of a single disk for large sequential operations. >While it is tru to some degree for hw raid-5, raid-z doesn''t work that way. The issue is that each zfs filesystem block is basically spread across n-1 devices. So every time you want to read back a single fs block you need to wait for all n-1 devices to provide you with a part of it - and keep in mind in zfs you can''t get a partial block even if that''s what you are asking for as zfs has to check checksum of entire fs block. Now multiple readers make it actually worse for raid-z (assuming very poor cache hit ratio) - because each read from each reader involves all disk drives basically others can''t read anything until it is done. It gets really bad for random reads. With HW raid-5 is your stripe size matches block you are reading back for random reads it is probable that while reader-X1 is reading from disk-Y1 reader-X2 is reading from disk-Y2 so you should end-up with all disk drives (-1) contributing to better overall iops. Read Roch''s blog entry carefully for more information. btw: even in your results 6x disks in raid-z provided over 3x less IOPS than zfs raid-10 configuration for random reads. It is a big difference if one needs performance. -- Robert Milkowski http://milek.blogspot.com
Edward Ned Harvey
2010-Jul-23 13:39 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
> From: Robert Milkowski [mailto:milek at task.gda.pl] > > [In raidz] The issue is that each zfs filesystem block is basically > spread across > n-1 devices. > So every time you want to read back a single fs block you need to wait > for all n-1 devices to provide you with a part of it - and keep in mind > in zfs you can''t get a partial block even if that''s what you are asking > for as zfs has to check checksum of entire fs block.Can anyone else confirm or deny the correctness of this statement? If you read a small file from a raidz volume, do you have to wait for every single disk to return a small chunk of the blocksize? I know this is true for large files which require more than one block, obviously, but even a small file gets spread out across multiple disks? This may be the way it''s currently implemented, but it''s not a mathematical requirement. It is possible, if desired, to implement raid parity and still allow small files to be written entirely on a single disk, without losing redundancy. Thus providing the redundancy, the large file performance, (both of which are already present in raidz), and also optimizing small file random operations, which may not already be optimized in raidz.
Arne Jansen
2010-Jul-23 14:06 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
Edward Ned Harvey wrote:>> From: Robert Milkowski [mailto:milek at task.gda.pl] >> >> [In raidz] The issue is that each zfs filesystem block is basically >> spread across >> n-1 devices. >> So every time you want to read back a single fs block you need to wait >> for all n-1 devices to provide you with a part of it - and keep in mind >> in zfs you can''t get a partial block even if that''s what you are asking >> for as zfs has to check checksum of entire fs block. > > Can anyone else confirm or deny the correctness of this statement? > > If you read a small file from a raidz volume, do you have to wait for every > single disk to return a small chunk of the blocksize? I know this is true > for large files which require more than one block, obviously, but even a > small file gets spread out across multiple disks? > > This may be the way it''s currently implemented, but it''s not a mathematical > requirement. It is possible, if desired, to implement raid parity and still > allow small files to be written entirely on a single disk, without losing > redundancy. Thus providing the redundancy, the large file performance, > (both of which are already present in raidz), and also optimizing small file > random operations, which may not already be optimized in raidz.As I understand it that''s the whole point of raidz. Each block is its own stripe. If necessary the block gets broken down into 512 byte chunks to spread it as wide as possible. Each block gets its own parity added. So if the array is too wide for the block to be spread to all disks, you also lose space because the stripe is not full and parity gets added to that small stripe. That means if you only write 512 byte blocks, each write writes 3 blocks to disk, so the net capacity goes down to one third, regardless how many disks you have in your raid group.
Edward Ned Harvey
2010-Jul-24 02:14 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
> From: Arne Jansen [mailto:sensille at gmx.net] > > > > Can anyone else confirm or deny the correctness of this statement? > > As I understand it that''s the whole point of raidz. Each block is its > own > stripe.Nope, that doesn''t count for confirmation. It is at least theoretically possible to implement raidz using techniques that would (a) unintelligently stripe all blocks (even small ones) across multiple disks, thus hurting performance on small operations, or (b) implement raidz such that striping of blocks behaves differently for small operations (plus parity). So the confirmation I''m looking for would be somebody who knows the actual source code, and the actual architecture that was chosen to implement raidz in this case.
Richard Elling
2010-Jul-24 04:54 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 23, 2010, at 7:14 PM, Edward Ned Harvey <shill at nedharvey.com> wrote:>> From: Arne Jansen [mailto:sensille at gmx.net] >>> >>> Can anyone else confirm or deny the correctness of this statement? >> >> As I understand it that''s the whole point of raidz. Each block is its >> own >> stripe. > > Nope, that doesn''t count for confirmation. It is at least theoretically > possible to implement raidz using techniques that would (a) unintelligently > stripe all blocks (even small ones) across multiple disks, thus hurting > performance on small operations, orI think you will find that for writes, especially small writes (<32k or so) the method used by raid performs much better than raid-5.> (b) implement raidz such that striping > of blocks behaves differently for small operations (plus parity). So the > confirmation I''m looking for would be somebody who knows the actual source > code, and the actual architecture that was chosen to implement raidz in this > case.http://src.OpenSolaris.org -- richard>
Ross Walker
2010-Jul-25 21:44 UTC
[zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 23, 2010, at 10:14 PM, Edward Ned Harvey <shill at nedharvey.com> wrote:>> From: Arne Jansen [mailto:sensille at gmx.net] >>> >>> Can anyone else confirm or deny the correctness of this statement? >> >> As I understand it that''s the whole point of raidz. Each block is its >> own >> stripe. > > Nope, that doesn''t count for confirmation. It is at least theoretically > possible to implement raidz using techniques that would (a) unintelligently > stripe all blocks (even small ones) across multiple disks, thus hurting > performance on small operations, or (b) implement raidz such that striping > of blocks behaves differently for small operations (plus parity). So the > confirmation I''m looking for would be somebody who knows the actual source > code, and the actual architecture that was chosen to implement raidz in this > case.Maybe this helps? http://blogs.sun.com/ahl/entry/what_is_raid_z -Ross