Stephan Budach
2010-Dec-11  15:48 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Hi,
on friday I received  two of my new fc raids, that I intended to use as 
my new zpool devices. These devices are from CiDesign and their 
type/model is iR16FC4ER. These are fc raids, that also allow JBOD 
operation, which is what I chose. So I configured 16 raid groups on each 
system and configured the raids to attach them to their fc channel one 
by one.
On my Sol11Expr host I have created a zpool of mirror vdevs, by 
selecting 1 disk  from either raid. This way I got a zpool that looks 
like this:
root at solaris11c:~# zpool status newObelixData
   pool: newObelixData
  state: ONLINE
  scan: resilvered 1K in 0h0m with 0 errors on Sat Dec 11 15:25:35 2010
config:
         NAME                        STATE     READ WRITE CKSUM
         newObelixData               ONLINE       0     0     0
           mirror-0                  ONLINE       0     0     0
             c9t2100001378AC02C7d0   ONLINE       0     0     0
             c9t2100001378AC0355d0   ONLINE       0     0     0
           mirror-1                  ONLINE       0     0     0
             c9t2100001378AC02C7d1   ONLINE       0     0     0
             c9t2100001378AC0355d1   ONLINE       0     0     0
           mirror-2                  ONLINE       0     0     0
             c9t2100001378AC02C7d2   ONLINE       0     0     0
             c9t2100001378AC0355d2   ONLINE       0     0     0
           mirror-3                  ONLINE       0     0     0
             c9t2100001378AC02C7d3   ONLINE       0     0     0
             c9t2100001378AC0355d3   ONLINE       0     0     0
           mirror-4                  ONLINE       0     0     0
             c9t2100001378AC02C7d4   ONLINE       0     0     0
             c9t2100001378AC0355d4   ONLINE       0     0     0
           mirror-5                  ONLINE       0     0     0
             c9t2100001378AC02C7d5   ONLINE       0     0     0
             c9t2100001378AC0355d5   ONLINE       0     0     0
           mirror-6                  ONLINE       0     0     0
             c9t2100001378AC02C7d6   ONLINE       0     0     0
             c9t2100001378AC0355d6   ONLINE       0     0     0
           mirror-7                  ONLINE       0     0     0
             c9t2100001378AC02C7d7   ONLINE       0     0     0
             c9t2100001378AC0355d7   ONLINE       0     0     0
           mirror-8                  ONLINE       0     0     0
             c9t2100001378AC02C7d8   ONLINE       0     0     0
             c9t2100001378AC0355d8   ONLINE       0     0     0
           mirror-9                  ONLINE       0     0     0
             c9t2100001378AC02C7d9   ONLINE       0     0     0
             c9t2100001378AC0355d9   ONLINE       0     0     0
           mirror-10                 ONLINE       0     0     0
             c9t2100001378AC02C7d10  ONLINE       0     0     0
             c9t2100001378AC0355d10  ONLINE       0     0     0
           mirror-11                 ONLINE       0     0     0
             c9t2100001378AC02C7d11  ONLINE       0     0     0
             c9t2100001378AC0355d11  ONLINE       0     0     0
           mirror-12                 ONLINE       0     0     0
             c9t2100001378AC02C7d12  ONLINE       0     0     0
             c9t2100001378AC0355d12  ONLINE       0     0     0
           mirror-13                 ONLINE       0     0     0
             c9t2100001378AC02C7d13  ONLINE       0     0     0
             c9t2100001378AC0355d13  ONLINE       0     0     0
           mirror-14                 ONLINE       0     0     0
             c9t2100001378AC02C7d14  ONLINE       0     0     0
             c9t2100001378AC0355d14  ONLINE       0     0     0
           mirror-15                 ONLINE       0     0     0
             c9t2100001378AC02C7d15  ONLINE       0     0     0
             c9t2100001378AC0355d15  ONLINE       0     0     0
errors: No known data errors
At first I disabled all write cache and read ahead options for each raid 
group on the raids, since I wanted to provide ZFS as much control over 
the drives as possible, but the performance was quite worse. I am 
running this zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran 
bonnie++ with -s 63356 -n 128 and got these results:
Sequential Output
char: 51819
block: 50602
rewrite: 28090
Sequential Input:
char: 62562
block 60979
Random seeks: 510 <- this seems really low to me, isn''t it?
Sequential Create:
create: 27529
read: 172287
delete: 30522
Random Create:
create: 25531
read: 244977
delete 29423
Since I was curious, what would happen, if I''d enable WriteCache and 
ReadAhead on the raid groups, I turned them on for all 32 devices and 
re-ran bonnie++. To my great dismay, this time zfs had a lot of random 
troubles with the drives, where zfs would remove drives arbitrarily from 
the pool since they exceeded the error thresholds. On one run, this only 
happend to 4 drives from one fc raid on the next run 3 drives from the 
other raid got removed from the pool.
I know, that I''d better disable all "optimizations" on the
raid side,
but the performance seems just too bad with these settings. Maybe 
running 16 mirrors in a zpool is not a good idea - but that seems more 
than unlikely to me.
Is there anything else I can check?
Cheers,
budy
Bob Friesenhahn
2010-Dec-13  01:42 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On Sat, 11 Dec 2010, Stephan Budach wrote:> > At first I disabled all write cache and read ahead options for each raid > group on the raids, since I wanted to provide ZFS as much control over the > drives as possible, but the performance was quite worse. I am running this > zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran bonnie++ with -s 63356 > -n 128 and got these results: > > Sequential Output > char: 51819 > block: 50602 > rewrite: 28090I am not very familiar with bonnie++ output. Does 51819 mean 51MB/second? If so, that is perhaps 1 disk''s worth of performance.> Random seeks: 510 <- this seems really low to me, isn''t it?It does seem a bit low. Everything depends on if the "random seek" was satisfied from ARC cache or from the underlying disk. You should be able to obtain at least the number of physical seeks available from 1/2 your total disks. For example, with 16 pair and if each disk could do 100 seeks per second, then you should expect at least 8*100 random seeks per second. With zfs mirroring and doing only read-seeks, you should expect to get up to 75% of the seek capability of all 16 disks combined.> Since I was curious, what would happen, if I''d enable WriteCache and > ReadAhead on the raid groups, I turned them on for all 32 devices and re-ran > bonnie++. To my great dismay, this time zfs had a lot of random troubles with > the drives, where zfs would remove drives arbitrarily from the pool since > they exceeded the error thresholds. On one run, this only happend to 4 drives > from one fc raid on the next run 3 drives from the other raid got removed > from the pool.Ungood. Note that with this many disks, you should be able to swamp your fiber channel link and that the fiber channel should be the sequential I/O bottleneck. It may also be that your RAID array firmware/CPUs become severely overloaded.> I know, that I''d better disable all "optimizations" on the raid side, but the > performance seems just too bad with these settings. Maybe running 16 mirrors > in a zpool is not a good idea - but that seems more than unlikely to me.16 mirrors in a zpool is a very good idea. Just keep in mind that this is a lot of I/O power and you might swamp your FC link and adaptor card.> Is there anything else I can check?Check the output of iostat -xn 30 while bonnie++ is running. This may reveal an issue. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ian Collins
2010-Dec-13  01:54 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On 12/12/10 04:48 AM, Stephan Budach wrote:> Hi, > > on friday I received two of my new fc raids, that I intended to use > as my new zpool devices. These devices are from CiDesign and their > type/model is iR16FC4ER. These are fc raids, that also allow JBOD > operation, which is what I chose. So I configured 16 raid groups on > each system and configured the raids to attach them to their fc > channel one by one. > > On my Sol11Expr host I have created a zpool of mirror vdevs, by > selecting 1 disk from either raid. This way I got a zpool that looks > like this: > At first I disabled all write cache and read ahead options for each > raid group on the raids, since I wanted to provide ZFS as much control > over the drives as possible, but the performance was quite worse. I am > running this zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran > bonnie++ with -s 63356 -n 128 and got these results: > > Sequential Output > char: 51819 > block: 50602 > rewrite: 28090 > > Sequential Input: > char: 62562 > block 60979 > > Random seeks: 510 <- this seems really low to me, isn''t it? > > Sequential Create: > create: 27529 > read: 172287 > delete: 30522 > > Random Create: > create: 25531 > read: 244977 > delete 29423 >The closet I have by way of caparison is an old thumper with a stripe of 9 mirrors: Sequential Output char: 206479 block: 601102 rewrite: 218089 Sequential Input: char: 138945 block 702598 Random seeks: 1970 Getting on for an order of magnitude better on I/O.> Is there anything else I can check? >iostat are recommended elsewhere. -- Ian.
Stephan Budach
2010-Dec-13  09:56 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Bob, Ian? thanks for your input. It may be that the fw on the raid really got overloaded and that may had to do with the way the GUI works. I am now testing the same configuration on another host, where I can risk some lockups when running bonnie++. I am able to set some options on the drive level, namely write cache and read ahead as well as on the virtual drive level. Unfortuanetly the options on the virtual drive level are called equally and I thought that setting these options on the drive level when configuring a JBOD raid group would also set them on the virtual disk level, but that didn''t happen. ;) So, odds are quite good that I overloaded the raid controller with lots of virtual disks that had their cache settings to write through and read ahead on. ATM, I have all options disabled on the drive level and on the raid group level as well as on the virtual drive level. My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? Thanks, budy
Bob Friesenhahn
2010-Dec-14  02:30 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On Mon, 13 Dec 2010, Stephan Budach wrote:> > My current run of bonnie is of course not that satisfactory and I wanted to > ask you, if it''s safe to turn on at least the drive level options, namely the > write cache and the read ahead?Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Stephan Budach
2010-Dec-14  06:43 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Am 14.12.2010 um 03:30 schrieb Bob Friesenhahn <bfriesen at simple.dallas.tx.us>:> On Mon, 13 Dec 2010, Stephan Budach wrote: >> >> My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? > > Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does.I know that bonnie++ does not generate the workload I will see on my server, but it reliably causes ZFS to kick out drives from the pool, which shouldn''t happen, of course. Actually, I am expecting the Qsan controller fw, which is what is build into these raids, has some issues, when it has to deal with high random I/O. I will try now my good old Infortrend systems and See, if I can reproduce this issue with them as well. Cheers, Budy
Stephan Budach
2010-Dec-14  16:32 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Am 14.12.10 07:43, schrieb Stephan Budach:> Am 14.12.2010 um 03:30 schrieb Bob Friesenhahn<bfriesen at simple.dallas.tx.us>: > >> On Mon, 13 Dec 2010, Stephan Budach wrote: >>> My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? >> Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does. > I know that bonnie++ does not generate the workload I will see on my server, but it reliably causes ZFS to kick out drives from the pool, which shouldn''t happen, of course. > > Actually, I am expecting the Qsan controller fw, which is what is build into these raids, has some issues, when it has to deal with high random I/O. > > I will try now my good old Infortrend systems and See, if I can reproduce this issue with them as well.I just wanted to wrap this up. So, actually the current firmware 1.0.8x for the CiDesign iR16FC4ER has a severe bug which caused ZFS to kick out random disks and to degrade the zpool. So, I tried the older firmware 1.07 which doesn''t has these issues and where the 2x16 JBODs are running very well. Since this is a FC-to-SATA2 raid I also had to tune the throttle parameter in the qlc.conf which led to a great performance boost - either 1 and 2 did a great job. Now, that this is solved, I can go ahead and transfer my data from my 2xRAID6 zpool onto these new devices. Cheers, budy