michael T sedwick
2007-Jun-20 01:56 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Given a 1.6TB ZFS Z-Raid consisting 6 disks: And a system that does an extreme amount of small /(<20K) /random reads /(more than twice as many reads as writes) / 1) What performance gains, if any does Z-Raid offer over other RAID or Large filesystem configurations? 2) What is any hindrance is Z-Raid to this configuration, given the complete randomness and size of these accesses? Would there be a better means of configuring a ZFS environment for this type of activity? thanks; ---michael
Richard Elling
2007-Jun-20 02:34 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
michael T sedwick wrote:> Given a 1.6TB ZFS Z-Raid consisting 6 disks: > And a system that does an extreme amount of small /(<20K) /random reads > /(more than twice as many reads as writes) / > > 1) What performance gains, if any does Z-Raid offer over other RAID or > Large filesystem configurations?For magnetic disk drives, RAID-Z performance for small, random reads will approximate the performance of a single disk, regardless of the number of disks in the set. The writes will not be random, so it should perform decently for writes.> 2) What is any hindrance is Z-Raid to this configuration, given the > complete randomness and size of these accesses?ZFS must read the entire RAID-Z stripe to verify the checksum.> Would there be a better means of configuring a ZFS environment for this > type of activity?In general, mirrors with dynamic stripes will offer better performance and RAS than RAID-Z. -- richard
Bart Smaalders
2007-Jun-20 02:38 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
michael T sedwick wrote:> Given a 1.6TB ZFS Z-Raid consisting 6 disks: > And a system that does an extreme amount of small /(<20K) /random reads > /(more than twice as many reads as writes) / > > 1) What performance gains, if any does Z-Raid offer over other RAID or > Large filesystem configurations? > > 2) What is any hindrance is Z-Raid to this configuration, given the > complete randomness and size of these accesses? > > > Would there be a better means of configuring a ZFS environment for this > type of activity? > > thanks; >A 6 disk raidz set is not optimal for random reads, since each disk in the raidz set needs to be accessed to retrieve each item. Note that if the reads are single threaded, this doesn''t apply. However, if multiple reads are extant at the same time, configuring the disks as 2 sets of 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x (approx) total parallel random read throughput. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Ian Collins
2007-Jun-20 02:59 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Bart Smaalders wrote:> michael T sedwick wrote: >> Given a 1.6TB ZFS Z-Raid consisting 6 disks: >> And a system that does an extreme amount of small /(<20K) /random >> reads /(more than twice as many reads as writes) / >> >> 1) What performance gains, if any does Z-Raid offer over other RAID >> or Large filesystem configurations? >> >> 2) What is any hindrance is Z-Raid to this configuration, given the >> complete randomness and size of these accesses? >> >> >> Would there be a better means of configuring a ZFS environment for >> this type of activity? >> >> thanks; >> > > A 6 disk raidz set is not optimal for random reads, since each disk in > the raidz set needs to be accessed to retrieve each item. Note that if > the reads are single threaded, this doesn''t apply. However, if multiple > reads are extant at the same time, configuring the disks as 2 sets of > 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x > (approx) total parallel random read throughput. >I''m not sure why, but when I was testing various configurations with bonnie++, 3 pairs of mirrors did give about 3x the random read performance of a 6 disk raidz, but with 4 pairs, the random read performance dropped by 50%: 3x2 Block read: 220464 Random read: 1520.1 4x2 Block read: 295747 Random read: 765.3 Ian
Bart Smaalders
2007-Jun-20 03:11 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Ian Collins wrote:> Bart Smaalders wrote: >> michael T sedwick wrote: >>> Given a 1.6TB ZFS Z-Raid consisting 6 disks: >>> And a system that does an extreme amount of small /(<20K) /random >>> reads /(more than twice as many reads as writes) / >>> >>> 1) What performance gains, if any does Z-Raid offer over other RAID >>> or Large filesystem configurations? >>> >>> 2) What is any hindrance is Z-Raid to this configuration, given the >>> complete randomness and size of these accesses? >>> >>> >>> Would there be a better means of configuring a ZFS environment for >>> this type of activity? >>> >>> thanks; >>> >> A 6 disk raidz set is not optimal for random reads, since each disk in >> the raidz set needs to be accessed to retrieve each item. Note that if >> the reads are single threaded, this doesn''t apply. However, if multiple >> reads are extant at the same time, configuring the disks as 2 sets of >> 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x >> (approx) total parallel random read throughput. >> > I''m not sure why, but when I was testing various configurations with > bonnie++, 3 pairs of mirrors did give about 3x the random read > performance of a 6 disk raidz, but with 4 pairs, the random read > performance dropped by 50%: > > 3x2 > Block read: 220464 > Random read: 1520.1 > > 4x2 > Block read: 295747 > Random read: 765.3 > > Ian > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussinteresting.... I wonder if the blocks being read were stripped across two mirror pairs; this would result in having to read 2 sets of mirror pairs, which would produce the reported results... - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Rob Logan
2007-Jun-20 04:03 UTC
[zfs-discuss] marvell88sx error in command 0x2f: status 0x51
with no seen effects `dmesg` reports lots of kern.warning] WARNING: marvell88sx1: port 3: error in command 0x2f: status 0x51 found in snv_62 and opensol-b66 perhaps http://bugs.opensolaris.org/view_bug.do?bug_id=6539787 can someone post part of the headers even if the code is closed? Rob
Matthew Ahrens
2007-Jun-20 05:25 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Bart Smaalders wrote:> Ian Collins wrote: >> Bart Smaalders wrote: >>> A 6 disk raidz set is not optimal for random reads, since each disk in >>> the raidz set needs to be accessed to retrieve each item. Note that if >>> the reads are single threaded, this doesn''t apply. However, if multiple >>> reads are extant at the same time, configuring the disks as 2 sets of >>> 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x >>> (approx) total parallel random read throughput.Actually, with 6 disks as 3 mirrored pairs, you should get around 6x the random read iops of a 6-disk raidz[2], because each side of the mirror can fulfill different read requests. We use the checksum to verify correctness, so we don''t need to read the same data from both sides of the mirror.>> I''m not sure why, but when I was testing various configurations with >> bonnie++, 3 pairs of mirrors did give about 3x the random read >> performance of a 6 disk raidz, but with 4 pairs, the random read >> performance dropped by 50%: > > interesting.... I wonder if the blocks being read were stripped across > two mirror pairs; this would result in having to read 2 sets > of mirror pairs, which would produce the reported results...Each block is entirely[*] on one top-level vdev (ie, mirrored pair in this case), so that would not happen. The observed performance degradation remains a mystery. --matt [*] assuming you have enough contiguous free space. On nearly-full pools, performance can suffer due to (among other things) "gang blocks" which essentially break large blocks into many several smaller blocks if there isn''t enough contiguous free space for the large block.
michael T sedwick
2007-Jun-20 05:48 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
OK... Is all this 3x; 6x potential performance boost still going to hold true in a Single Controller scenario? Hardware is x4100''s (Solaris 10) w/ 6-disk raidz on external 3320''s? I seem to remember /(wait... checking Notes...) / correct... the ZFS filesystem is < 50% capacity. This info could lead me to follow why I was also seeing ''sched'' running a lot as well... ---michael ==================================================================)_____________________________________________________________________________( Matthew Ahrens wrote:> Bart Smaalders wrote: >> Ian Collins wrote: >>> Bart Smaalders wrote: >>>> A 6 disk raidz set is not optimal for random reads, since each disk in >>>> the raidz set needs to be accessed to retrieve each item. Note >>>> that if >>>> the reads are single threaded, this doesn''t apply. However, if >>>> multiple >>>> reads are extant at the same time, configuring the disks as 2 sets of >>>> 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x >>>> and 3x >>>> (approx) total parallel random read throughput. > > Actually, with 6 disks as 3 mirrored pairs, you should get around 6x > the random read iops of a 6-disk raidz[2], because each side of the > mirror can fulfill different read requests. We use the checksum to > verify correctness, so we don''t need to read the same data from both > sides of the mirror. > >>> I''m not sure why, but when I was testing various configurations with >>> bonnie++, 3 pairs of mirrors did give about 3x the random read >>> performance of a 6 disk raidz, but with 4 pairs, the random read >>> performance dropped by 50%: >> >> interesting.... I wonder if the blocks being read were stripped across >> two mirror pairs; this would result in having to read 2 sets >> of mirror pairs, which would produce the reported results... > > Each block is entirely[*] on one top-level vdev (ie, mirrored pair in > this case), so that would not happen. The observed performance > degradation remains a mystery. > > --matt > > [*] assuming you have enough contiguous free space. On nearly-full > pools, performance can suffer due to (among other things) "gang > blocks" which essentially break large blocks into many several smaller > blocks if there isn''t enough contiguous free space for the large block. >
Mario Goebbels
2007-Jun-20 09:58 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
> A 6 disk raidz set is not optimal for random reads, since each disk in > the raidz set needs to be accessed to retrieve each item.I don''t understand, if the file is contained within a single stripe, why would it need to access the other disks, if the checksum of the stripe is OK? Also, why wouldn''t it be able to concurrently access different disks for multiple reads? -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 648 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070620/5b614db6/attachment.bin>
Ian Collins
2007-Jun-20 10:41 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Mario Goebbels wrote:>> A 6 disk raidz set is not optimal for random reads, since each disk in >> the raidz set needs to be accessed to retrieve each item. >> > > I don''t understand, if the file is contained within a single stripe, why > would it need to access the other disks, if the checksum of the stripe > is OK? Also, why wouldn''t it be able to concurrently access different > disks for multiple reads? > >The item is striped across all the drives, so you have to wait for the slowest drive. Ian
Darren Dunham
2007-Jun-20 15:19 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
> > A 6 disk raidz set is not optimal for random reads, since each disk in=20 > > the raidz set needs to be accessed to retrieve each item. > > I don''t understand, if the file is contained within a single stripe, why > would it need to access the other disks, if the checksum of the stripe > is OK?Because you have to read the entire stripe (which probably spans all the disks) to verify the checksum. With a traditional Raid5 setup, there is no checksum, so there is no need to read the entire stripe for a small read. It may need only a single disk to satisfy the read.> Also, why wouldn''t it be able to concurrently access different > disks for multiple reads?In a random read situation, you''ll likely have to read one slice (on all the disks), then seek and read a separate slice (on all the disks). That limits single-threaded random read throughput. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Mario Goebbels
2007-Jun-21 12:36 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
> Because you have to read the entire stripe (which probably spans all the > disks) to verify the checksum.Then I have a wrong idea of what a stripe is. I always thought it''s the interleave block size. -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 648 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070621/922f3885/attachment.bin>
Roch Bourbonnais
2007-Jun-21 16:27 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Le 20 juin 07 ? 04:59, Ian Collins a ?crit :>> > I''m not sure why, but when I was testing various configurations with > bonnie++, 3 pairs of mirrors did give about 3x the random read > performance of a 6 disk raidz, but with 4 pairs, the random read > performance dropped by 50%: > > 3x2 > Block read: 220464 > Random read: 1520.1 > > 4x2 > Block read: 295747 > Random read: 765.3 > > Ian > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussDid you recreate the pool from scratch or did you add a pair of disk to the existing mirror ? If starting from scratch, I''m stumped. But for the later, the problem might lie in the data population. The newly added mirror might have gotten a larger share of the added data and restitution did not target all disks evenly. -r
Richard Elling
2007-Jun-21 17:40 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Mario Goebbels wrote:>> Because you have to read the entire stripe (which probably spans all the >> disks) to verify the checksum. > > Then I have a wrong idea of what a stripe is. I always thought it''s the > interleave block size.Nope. A stripe generally refers to the logical block as spread across physical devices. For most RAID implementations ("hardware", "firmware", or software), the interleave size is the stripe width divided by the number of devices. In ZFS, dynamic striping doesn''t have this restriction, which is how we can dynamically add physical devices to existing stripes. Jeff Bonwick describes this in the context of RAID-Z at http://blogs.sun.com/bonwick/entry/raid_z -- richard
Ian Collins
2007-Jun-21 21:58 UTC
[zfs-discuss] Z-Raid performance with Random reads/writes
Roch Bourbonnais wrote:> > Le 20 juin 07 ? 04:59, Ian Collins a ?crit : > >>> >> I''m not sure why, but when I was testing various configurations with >> bonnie++, 3 pairs of mirrors did give about 3x the random read >> performance of a 6 disk raidz, but with 4 pairs, the random read >> performance dropped by 50%: >> >> 3x2 >> Block read: 220464 >> Random read: 1520.1 >> >> 4x2 >> Block read: 295747 >> Random read: 765.3 > > Did you recreate the pool from scratch or did you add a pair of disk > to the existing mirror ? > If starting from scratch, I''m stumped. But for the later, the problem > might lie in the data population. > The newly added mirror might have gotten a larger share of the added > data and restitution did not target all disks > evenly. > >From scratch. Each test was run on a new pool with one filesystem.Ian