Stephan Budach
2010-Dec-11 15:48 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Hi, on friday I received two of my new fc raids, that I intended to use as my new zpool devices. These devices are from CiDesign and their type/model is iR16FC4ER. These are fc raids, that also allow JBOD operation, which is what I chose. So I configured 16 raid groups on each system and configured the raids to attach them to their fc channel one by one. On my Sol11Expr host I have created a zpool of mirror vdevs, by selecting 1 disk from either raid. This way I got a zpool that looks like this: root at solaris11c:~# zpool status newObelixData pool: newObelixData state: ONLINE scan: resilvered 1K in 0h0m with 0 errors on Sat Dec 11 15:25:35 2010 config: NAME STATE READ WRITE CKSUM newObelixData ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c9t2100001378AC02C7d0 ONLINE 0 0 0 c9t2100001378AC0355d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t2100001378AC02C7d1 ONLINE 0 0 0 c9t2100001378AC0355d1 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2100001378AC02C7d2 ONLINE 0 0 0 c9t2100001378AC0355d2 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t2100001378AC02C7d3 ONLINE 0 0 0 c9t2100001378AC0355d3 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c9t2100001378AC02C7d4 ONLINE 0 0 0 c9t2100001378AC0355d4 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c9t2100001378AC02C7d5 ONLINE 0 0 0 c9t2100001378AC0355d5 ONLINE 0 0 0 mirror-6 ONLINE 0 0 0 c9t2100001378AC02C7d6 ONLINE 0 0 0 c9t2100001378AC0355d6 ONLINE 0 0 0 mirror-7 ONLINE 0 0 0 c9t2100001378AC02C7d7 ONLINE 0 0 0 c9t2100001378AC0355d7 ONLINE 0 0 0 mirror-8 ONLINE 0 0 0 c9t2100001378AC02C7d8 ONLINE 0 0 0 c9t2100001378AC0355d8 ONLINE 0 0 0 mirror-9 ONLINE 0 0 0 c9t2100001378AC02C7d9 ONLINE 0 0 0 c9t2100001378AC0355d9 ONLINE 0 0 0 mirror-10 ONLINE 0 0 0 c9t2100001378AC02C7d10 ONLINE 0 0 0 c9t2100001378AC0355d10 ONLINE 0 0 0 mirror-11 ONLINE 0 0 0 c9t2100001378AC02C7d11 ONLINE 0 0 0 c9t2100001378AC0355d11 ONLINE 0 0 0 mirror-12 ONLINE 0 0 0 c9t2100001378AC02C7d12 ONLINE 0 0 0 c9t2100001378AC0355d12 ONLINE 0 0 0 mirror-13 ONLINE 0 0 0 c9t2100001378AC02C7d13 ONLINE 0 0 0 c9t2100001378AC0355d13 ONLINE 0 0 0 mirror-14 ONLINE 0 0 0 c9t2100001378AC02C7d14 ONLINE 0 0 0 c9t2100001378AC0355d14 ONLINE 0 0 0 mirror-15 ONLINE 0 0 0 c9t2100001378AC02C7d15 ONLINE 0 0 0 c9t2100001378AC0355d15 ONLINE 0 0 0 errors: No known data errors At first I disabled all write cache and read ahead options for each raid group on the raids, since I wanted to provide ZFS as much control over the drives as possible, but the performance was quite worse. I am running this zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran bonnie++ with -s 63356 -n 128 and got these results: Sequential Output char: 51819 block: 50602 rewrite: 28090 Sequential Input: char: 62562 block 60979 Random seeks: 510 <- this seems really low to me, isn''t it? Sequential Create: create: 27529 read: 172287 delete: 30522 Random Create: create: 25531 read: 244977 delete 29423 Since I was curious, what would happen, if I''d enable WriteCache and ReadAhead on the raid groups, I turned them on for all 32 devices and re-ran bonnie++. To my great dismay, this time zfs had a lot of random troubles with the drives, where zfs would remove drives arbitrarily from the pool since they exceeded the error thresholds. On one run, this only happend to 4 drives from one fc raid on the next run 3 drives from the other raid got removed from the pool. I know, that I''d better disable all "optimizations" on the raid side, but the performance seems just too bad with these settings. Maybe running 16 mirrors in a zpool is not a good idea - but that seems more than unlikely to me. Is there anything else I can check? Cheers, budy
Bob Friesenhahn
2010-Dec-13 01:42 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On Sat, 11 Dec 2010, Stephan Budach wrote:> > At first I disabled all write cache and read ahead options for each raid > group on the raids, since I wanted to provide ZFS as much control over the > drives as possible, but the performance was quite worse. I am running this > zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran bonnie++ with -s 63356 > -n 128 and got these results: > > Sequential Output > char: 51819 > block: 50602 > rewrite: 28090I am not very familiar with bonnie++ output. Does 51819 mean 51MB/second? If so, that is perhaps 1 disk''s worth of performance.> Random seeks: 510 <- this seems really low to me, isn''t it?It does seem a bit low. Everything depends on if the "random seek" was satisfied from ARC cache or from the underlying disk. You should be able to obtain at least the number of physical seeks available from 1/2 your total disks. For example, with 16 pair and if each disk could do 100 seeks per second, then you should expect at least 8*100 random seeks per second. With zfs mirroring and doing only read-seeks, you should expect to get up to 75% of the seek capability of all 16 disks combined.> Since I was curious, what would happen, if I''d enable WriteCache and > ReadAhead on the raid groups, I turned them on for all 32 devices and re-ran > bonnie++. To my great dismay, this time zfs had a lot of random troubles with > the drives, where zfs would remove drives arbitrarily from the pool since > they exceeded the error thresholds. On one run, this only happend to 4 drives > from one fc raid on the next run 3 drives from the other raid got removed > from the pool.Ungood. Note that with this many disks, you should be able to swamp your fiber channel link and that the fiber channel should be the sequential I/O bottleneck. It may also be that your RAID array firmware/CPUs become severely overloaded.> I know, that I''d better disable all "optimizations" on the raid side, but the > performance seems just too bad with these settings. Maybe running 16 mirrors > in a zpool is not a good idea - but that seems more than unlikely to me.16 mirrors in a zpool is a very good idea. Just keep in mind that this is a lot of I/O power and you might swamp your FC link and adaptor card.> Is there anything else I can check?Check the output of iostat -xn 30 while bonnie++ is running. This may reveal an issue. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ian Collins
2010-Dec-13 01:54 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On 12/12/10 04:48 AM, Stephan Budach wrote:> Hi, > > on friday I received two of my new fc raids, that I intended to use > as my new zpool devices. These devices are from CiDesign and their > type/model is iR16FC4ER. These are fc raids, that also allow JBOD > operation, which is what I chose. So I configured 16 raid groups on > each system and configured the raids to attach them to their fc > channel one by one. > > On my Sol11Expr host I have created a zpool of mirror vdevs, by > selecting 1 disk from either raid. This way I got a zpool that looks > like this: > At first I disabled all write cache and read ahead options for each > raid group on the raids, since I wanted to provide ZFS as much control > over the drives as possible, but the performance was quite worse. I am > running this zpool on a Sun Fire X4170M2 with 32 GB of RAM so I ran > bonnie++ with -s 63356 -n 128 and got these results: > > Sequential Output > char: 51819 > block: 50602 > rewrite: 28090 > > Sequential Input: > char: 62562 > block 60979 > > Random seeks: 510 <- this seems really low to me, isn''t it? > > Sequential Create: > create: 27529 > read: 172287 > delete: 30522 > > Random Create: > create: 25531 > read: 244977 > delete 29423 >The closet I have by way of caparison is an old thumper with a stripe of 9 mirrors: Sequential Output char: 206479 block: 601102 rewrite: 218089 Sequential Input: char: 138945 block 702598 Random seeks: 1970 Getting on for an order of magnitude better on I/O.> Is there anything else I can check? >iostat are recommended elsewhere. -- Ian.
Stephan Budach
2010-Dec-13 09:56 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Bob, Ian? thanks for your input. It may be that the fw on the raid really got overloaded and that may had to do with the way the GUI works. I am now testing the same configuration on another host, where I can risk some lockups when running bonnie++. I am able to set some options on the drive level, namely write cache and read ahead as well as on the virtual drive level. Unfortuanetly the options on the virtual drive level are called equally and I thought that setting these options on the drive level when configuring a JBOD raid group would also set them on the virtual disk level, but that didn''t happen. ;) So, odds are quite good that I overloaded the raid controller with lots of virtual disks that had their cache settings to write through and read ahead on. ATM, I have all options disabled on the drive level and on the raid group level as well as on the virtual drive level. My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? Thanks, budy
Bob Friesenhahn
2010-Dec-14 02:30 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
On Mon, 13 Dec 2010, Stephan Budach wrote:> > My current run of bonnie is of course not that satisfactory and I wanted to > ask you, if it''s safe to turn on at least the drive level options, namely the > write cache and the read ahead?Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Stephan Budach
2010-Dec-14 06:43 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Am 14.12.2010 um 03:30 schrieb Bob Friesenhahn <bfriesen at simple.dallas.tx.us>:> On Mon, 13 Dec 2010, Stephan Budach wrote: >> >> My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? > > Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does.I know that bonnie++ does not generate the workload I will see on my server, but it reliably causes ZFS to kick out drives from the pool, which shouldn''t happen, of course. Actually, I am expecting the Qsan controller fw, which is what is build into these raids, has some issues, when it has to deal with high random I/O. I will try now my good old Infortrend systems and See, if I can reproduce this issue with them as well. Cheers, Budy
Stephan Budach
2010-Dec-14 16:32 UTC
[zfs-discuss] What performance to expect from mirror vdevs?
Am 14.12.10 07:43, schrieb Stephan Budach:> Am 14.12.2010 um 03:30 schrieb Bob Friesenhahn<bfriesen at simple.dallas.tx.us>: > >> On Mon, 13 Dec 2010, Stephan Budach wrote: >>> My current run of bonnie is of course not that satisfactory and I wanted to ask you, if it''s safe to turn on at least the drive level options, namely the write cache and the read ahead? >> Enabling the write cache is fine as long as it is non-volatile or is flushed to disk when zfs requests it. Zfs will request a transaction-group flush on all disks before proceeding with the next batch of writes. The read ahead might not be all that valuable in practice (and might cause a severe penalty) because it assumes a particular mode and timing of access which might not match how your system is actually used. Most usage scenarios are something other than what bonnie++ does. > I know that bonnie++ does not generate the workload I will see on my server, but it reliably causes ZFS to kick out drives from the pool, which shouldn''t happen, of course. > > Actually, I am expecting the Qsan controller fw, which is what is build into these raids, has some issues, when it has to deal with high random I/O. > > I will try now my good old Infortrend systems and See, if I can reproduce this issue with them as well.I just wanted to wrap this up. So, actually the current firmware 1.0.8x for the CiDesign iR16FC4ER has a severe bug which caused ZFS to kick out random disks and to degrade the zpool. So, I tried the older firmware 1.07 which doesn''t has these issues and where the 2x16 JBODs are running very well. Since this is a FC-to-SATA2 raid I also had to tune the throttle parameter in the qlc.conf which led to a great performance boost - either 1 and 2 did a great job. Now, that this is solved, I can go ahead and transfer my data from my 2xRAID6 zpool onto these new devices. Cheers, budy