Dave Pooser
2010-Feb-14 23:12 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
I''m trying to set up an OpenSolaris 2009.6 server as a Fibre Channel storage device, and I''m seeing painfully slow performance while copying large (6-50GB) files -- like 3-5 MB/second over 4Gb FC. However, if instead of creating a volume and exporting it via FC I create a standard filesystem and export it over CIFS I see speeds more in line with the connection speed-- 54MB/second over gigabit Ethernet. (The problem is for the application in question I really need speeds closer to 100-120MB/second, hence the FC connection.) Any suggestions on how to troubleshoot this would be appreciated; I''m more of a Mac/Linux guy and my previous ZFS/OpenSolaris experience has been that things "just work" so my ZFS troubleshooting skills are a bit underdeveloped. ** My hardware: Intel S5500BC mainboard Intel E5506 Xeon 2.13GHz 8GB RAM 3x LSI 3018 PCIe SATA controllers (latest IT firmware) 8x 2TB Hitachi 7200RPM SATA drives (2 connected to each LSI and 2 to motherboard SATA ports) 2x 60GB Imation M-class SSD (boot mirror) Qlogic 2440 PCIe Fibre Channel HBA ** How I got here (pardon my wrapping): locadmin at storage1:~# zpool create -f bigpool raidz2 c7t1d0 c8t0d0 c8t1d0 c9t0d0 c9t1d0 c10d1 c11d0 c11d1 == Without the -f I got: invalid vdev specification use ''-f'' to override the following errors: raidz contains devices of different sizes even though all eight drives are Hitachi 2TB, same model, purchased at the same time -- dunno if that''s related to my problem, though it doesn''t seem to impact sharing a filesystem over CIFS ==locadmin at storage1:~# zfs create -b 128k -V 10700g bigpool/uberdisk locadmin at storage1:~# sbdadm create-lu /dev/zvol/rdsk/bigpool/uberdisk locadmin at storage1:~# stmfadm add-view 600144f0383cc50000004b786ac50001 One other thing I notice is that when I run "iostat -xndz 1" during a slooooow fibre channel copy I see heavy %b on three of the drives and much less on the rest: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 190.0 0.0 1802.3 0.0 0.8 0.0 4.4 1 80 c10d1 0.0 190.0 0.0 1800.3 0.0 0.7 0.0 3.5 1 63 c11d0 0.0 190.0 0.0 1800.8 0.0 0.8 0.1 4.4 1 79 c11d1 0.0 190.0 0.0 1800.8 0.0 0.1 0.0 0.5 0 6 c7t1d0 0.0 190.0 0.0 1800.8 0.0 0.1 0.0 0.5 0 6 c8t0d0 0.0 190.0 0.0 1801.3 0.0 0.1 0.0 0.5 0 6 c8t1d0 0.0 190.0 0.0 1801.3 0.0 0.1 0.0 0.5 0 6 c9t0d0 0.0 190.0 0.0 1802.8 0.0 0.1 0.0 0.5 0 6 c9t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 164.9 0.0 1549.0 0.0 0.8 0.0 4.7 0 75 c10d1 0.0 164.9 0.0 1551.0 0.0 0.7 0.0 4.1 0 65 c11d0 0.0 164.9 0.0 1551.0 0.0 1.0 0.0 5.9 1 83 c11d1 0.0 164.9 0.0 1552.0 0.0 0.1 0.0 0.5 0 5 c7t1d0 0.0 164.9 0.0 1552.0 0.0 0.1 0.0 0.5 0 5 c8t0d0 0.0 164.9 0.0 1550.0 0.0 0.1 0.0 0.5 0 5 c8t1d0 0.0 164.9 0.0 1550.0 0.0 0.1 0.0 0.5 0 5 c9t0d0 0.0 164.9 0.0 1549.0 0.0 0.1 0.0 0.5 0 5 c9t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 174.0 0.0 1697.4 0.0 0.8 0.0 4.5 1 76 c10d1 0.0 175.0 0.0 1696.9 0.0 0.6 0.0 3.5 1 58 c11d0 0.0 174.0 0.0 1696.4 0.0 0.9 0.1 5.4 1 78 c11d1 0.0 174.0 0.0 1697.4 0.0 0.1 0.0 0.5 0 5 c7t1d0 0.0 174.0 0.0 1697.9 0.0 0.1 0.0 0.5 0 5 c8t0d0 0.0 175.0 0.0 1698.4 0.0 0.1 0.0 0.5 0 5 c8t1d0 0.0 175.0 0.0 1698.4 0.0 0.1 0.0 0.5 0 5 c9t0d0 0.0 175.0 0.0 1697.4 0.0 0.1 0.0 0.5 0 5 c9t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 170.0 0.0 1618.4 0.0 0.8 0.1 4.7 1 77 c10d1 0.0 169.0 0.0 1617.4 0.0 0.6 0.0 3.3 1 53 c11d0 0.0 170.0 0.0 1617.4 0.0 0.9 0.1 5.5 1 79 c11d1 0.0 170.0 0.0 1617.9 0.0 0.1 0.0 0.5 0 5 c7t1d0 0.0 170.0 0.0 1617.9 0.0 0.1 0.0 0.5 0 5 c8t0d0 0.0 169.0 0.0 1619.4 0.0 0.1 0.0 0.5 0 5 c8t1d0 0.0 169.0 0.0 1619.4 0.0 0.1 0.0 0.5 0 5 c9t0d0 0.0 169.0 0.0 1618.4 0.0 0.1 0.0 0.5 0 5 c9t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 191.0 0.0 1816.2 0.0 0.9 0.0 4.6 1 85 c10d1 0.0 190.0 0.0 1793.7 0.0 0.5 0.0 2.7 1 49 c11d0 0.0 189.0 0.0 1772.7 0.0 0.8 0.1 4.4 1 76 c11d1 0.0 191.0 0.0 1815.7 0.0 0.1 0.0 0.5 0 6 c7t1d0 0.0 191.0 0.0 1815.7 0.0 0.1 0.0 0.5 0 6 c8t0d0 0.0 191.0 0.0 1816.2 0.0 0.1 0.0 0.5 0 6 c8t1d0 0.0 191.0 0.0 1816.2 0.0 0.1 0.0 0.5 0 6 c9t0d0 0.0 191.0 0.0 1816.2 0.0 0.1 0.0 0.5 0 6 c9t1d0 -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com
Nigel Smith
2010-Feb-15 00:25 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
Hi Dave So which hard drives are connected to which controllers? And what device drivers are those controllers using? The output from ''format'', ''cfgadm'' and ''prtconf -D'' may help us to understand. Strange that you say that there are two hard drives per controllers, but three drives are showing high %b. And strange that you have c7,c8,c9,c10,c11 which looks like FIVE controllers! Regards Nigel Smith -- This message posted from opensolaris.org
Dave Pooser
2010-Feb-15 01:15 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> So which hard drives are connected to which controllers? > And what device drivers are those controllers using?0. c7t0d0 <DEFAULT cyl 7764 alt 2 hd 255 sec 63> /pci at 0,0/pci8086,340a at 3/pci1000,3140 at 0/sd at 0,0 1. c7t1d0 <ATA-Hitachi HDS72202-A20N-1.82TB> /pci at 0,0/pci8086,340a at 3/pci1000,3140 at 0/sd at 1,0 2. c8t0d0 <ATA-Hitachi HDS72202-A20N-1.82TB> /pci at 0,0/pci8086,340e at 7/pci1000,3140 at 0/sd at 0,0 3. c8t1d0 <ATA-Hitachi HDS72202-A20N-1.82TB> /pci at 0,0/pci8086,340e at 7/pci1000,3140 at 0/sd at 1,0 4. c9t0d0 <ATA-Hitachi HDS72202-A20N-1.82TB> /pci at 0,0/pci8086,3410 at 9/pci1000,3140 at 0/sd at 0,0 5. c9t1d0 <ATA-Hitachi HDS72202-A20N-1.82TB> /pci at 0,0/pci8086,3410 at 9/pci1000,3140 at 0/sd at 1,0 6. c10d0 <DEFAULT cyl 7764 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 7. c10d1 <Hitachi- JK1131YAGP8N3-0001-1.82TB> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 1,0 8. c11d0 <Hitachi- JK1131YAGZE4Z-0001-1.82TB> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 9. c11d1 <Hitachi- JK1131YAGGMT9-0001-1.82TB> /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0> Strange that you say > that there are two hard drives > per controllers, but three drives are showing > high %b. > > And strange that you have c7,c8,c9,c10,c11 > which looks like FIVE controllers!c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard has 6 SATA ports which are presented as two controllers (presumably c10 and c11) one for ports 0-3 and one for ports 4 and 5; both currently use the PCI-IDE drivers. And as you say, it''s odd that there are three drives on c10 and c11, since they should have only two of the raidz2 drives; I need to go double-check my cabling. The way it''s *supposed* to be configured is: c7: two RAIDZ2 drives and one of the boot mirror drives c8: two RAIDZ2 drives c9: two RAIDZ2 drives c10: one RAIDZ2 drive and one of the boot mirror drives c11: one RAIDZ2 drive (The theory here is that since this server is going to spend its life being shipped places in the back of a truck I want to make sure that no single controller failure can either render it unbootable or destroy the RAIDZ2.) That said, I think that this is probably *a* tuning problem but not *the* tuning problem, since I was getting acceptable performance over CIFS and miserable performance over FC. Richard Elling suggested I try the latest dev release to see if I''m encountering a bug that forces synchronous writes, so I''m off to straighten out my controller distribution, check to see if I have write caching turned off on the motherboard ports, install the b132 build, and possibly grab some dinner while I''m about it. I''ll report back to the list with any progress or lack thereof. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com
Bob Friesenhahn
2010-Feb-15 02:42 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
On Sun, 14 Feb 2010, Dave Pooser wrote:> > c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard has > 6 SATA ports which are presented as two controllers (presumably c10 and c11) > one for ports 0-3 and one for ports 4 and 5; both currently use the PCI-IDE > drivers.One should expect that the IDE interface will be less performant than the SATA interface. For example, it seems likely that IDE does not support NCQ, so only one write could be scheduled at a time while SATA can burst multiple writes into the drive cache at a time. This would explain if the IDE drives seem to be 100% busy while the SATA drives are almost idle. This would cause issues for synchronous writes. Absent careful engineering, raidz2 usually only has one bottleneck at a time. See "http://en.wikipedia.org/wiki/Integrated_Drive_Electronics#IDE_and_ATA-1".> That said, I think that this is probably *a* tuning problem but not *the* > tuning problem, since I was getting acceptable performance over CIFS and > miserable performance over FC. Richard Elling suggested I try the latest dev > release to see if I''m encountering a bug that forces synchronous writes, soA difference in the way synchronous writes are handled could certainly make a huge difference. It is useful to do asynchronous and synchronous write benchmarks on the local system before getting the higher level protocols involved. As far as the warning about different sized devices goes, I am wondering if there is a limit to the maximum size of an IDE-based device and so some devices are claimed larger than others. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Thomas Burgess
2010-Feb-15 02:49 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> > c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard has > 6 SATA ports which are presented as two controllers (presumably c10 and > c11) > one for ports 0-3 and one for ports 4 and 5; both currently use the PCI-IDE > drivers. > >on my motherboard, i can make the onboard sata ports show up as IDE or SATA, you may look into that. It would probably be something like AHCI mode. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/d62f34db/attachment.html>
Tim Cook
2010-Feb-15 03:01 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
On Sun, Feb 14, 2010 at 8:49 PM, Thomas Burgess <wonslung at gmail.com> wrote:> > >> c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard >> has >> 6 SATA ports which are presented as two controllers (presumably c10 and >> c11) >> one for ports 0-3 and one for ports 4 and 5; both currently use the >> PCI-IDE >> drivers. >> >> > on my motherboard, i can make the onboard sata ports show up as IDE or > SATA, you may look into that. It would probably be something like AHCI > mode. > > >Those are actual IDE ports. That''s why they show up as the same controller, different disk. He might have a tough time turning IDE ports into SATA in the BIOS ;) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/7ff90287/attachment.html>
Dave Pooser
2010-Feb-15 03:45 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> on my motherboard, i can make the onboard sata ports show up as IDE or SATA, > you may look into that.? It would probably be something like AHCI mode.Yeah, I changed the motherboard setting from "enhanced" to AHCI and now those ports show up as SATA. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com
Thomas Burgess
2010-Feb-15 04:10 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
oh, so i WAS right? awesome On Sun, Feb 14, 2010 at 10:45 PM, Dave Pooser <dave.zfs at alfordmedia.com>wrote:> > on my motherboard, i can make the onboard sata ports show up as IDE or > SATA, > > you may look into that. It would probably be something like AHCI mode. > > Yeah, I changed the motherboard setting from "enhanced" to AHCI and now > those ports show up as SATA. > -- > Dave Pooser, ACSA > Manager of Information Services > Alford Media http://www.alfordmedia.com > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/7ab01e3e/attachment.html>
Dave Pooser
2010-Feb-15 04:49 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> I''m off to straighten out my controller distribution, check to see if I have > write caching turned off on the motherboard ports, install the b132 build, > and possibly grab some dinner while I''m about it. I''ll report back to the > list with any progress or lack thereof.OK, the issue seems to be resolved now-- I''m seeing write speeds in excess of 160MB/s. What I did to fix things: 1) Redistributed drives across controllers to match my actual configuration-- thanks to Nigel for pointing that one out 2) Set my motherboard controller to AHCI mode-- thanks to Richard and Thomas for suggesting that. Once I made that change I no longer saw the "raidz contains devices of different sizes" error, so it looks like Bob was right about the source of that error 3) Upgraded to OpenSolaris 2010.03 preview b132 which appears to correct a problem in 2009.06 where iSCSI (and apparently FC) forced all writes to be synchronous -- thanks to Richard for that pointer. Five hours from tearing my hair out to toasting a success-- this list is a great resource! -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com
Richard Jahnel
2010-Feb-23 03:34 UTC
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
You might try this and see what you get. Changing to a file backed fibretarget resulted in a 3x performance boost for me. locadmin at storage1:~# touch /bigpool/uberdisk/vol1 locadmin at storage1:~# sbdadm create-lu -s 10700G /bigpool/uberdisk/vol1 locadmin at storage1:~# stmfadm add-view 600144f0383cc50000004b786ac50001 Let us know how that turns out if you decide to try it. -- This message posted from opensolaris.org