Ray Van Dolson
2010-Aug-24 23:05 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
I posted a thread on this once long ago[1] -- but we''re still fighting with this problem and I wanted to throw it out here again. All of our hardware is from Silicon Mechanics (SuperMicro chassis and motherboards). Up until now, all of the hardware has had a single 24-disk expander / backplane -- but we recently got one of the new SC847-based models with 24 disks up front and 12 in the back -- a dual backplane setup. We''re using two SSD''s in the front backplane as mirrored ZIL/OS (I don''t think we have the 4K alignment set up correctly) and two drives in the back as L2ARC. The rest of the disks are 1TB SATA disks which make up a single large zpool via three 8-disk RAIDZ2''s. As you can see, we don''t have the server maxed out on drives... In any case, this new server gets between 400 and 600 of these timeout errors an hour: Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0/sd at 8,0 (sd0): Aug 21 03:10:17 dev-zfs1 Error for Command: write(10) Error Level: Retryable Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Requested Block: 21230708 Error Block: 21230708 Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM002600EW Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): iostat -xnMCez shows that the first of the two ZIL drives receives about twice the number of "errors" as the second drive. There are no other errors on any other drives -- including the L2ARC SSD''s and the ascv_t times seem reasonably low and don''t indicate a bad drive to my eyes... The timeouts above exact a rather large performance penalty on the system, both in IO and general usage from an SSH console. Obvious pauses and glitches when accessing the filesystem. The problem _follows_ the ZIL and isn''t tied to hardware. IOW, if I switch to using the L2ARC drives as ZIL, those drives suddenly exhibit the timeout problems... If we connect the SSD drives directly to the LSI controller instead of hanging off the hot-swap backplane, the timeouts go away. If we use SSD''s attached to the SATA controllers as ZIL, there are also no performance issues or timeout errors. So the problem only occurs with SSD drives acting as ZIL attached to the backplane. This is leading me to believe we have a driver issue of some sort in the mpt subsystem unable to cope with the longer command path of multiple backplanes. Someone alluded to this in [1] as well, and it makes sense to me. One quick fix to me would seem to be upping the SCSI timeout values. How do you do this with the mpt driver? We haven''t yet been able to try OpenSolaris or Nexenta on one of these systems to see if the problem goes away in later releases of the kernel or driver, but I''m curious if anyone out there has any bright ideas as to what we might be running into here and what''s involved in fixing it. We''ve swapped out backplanes and drives and the problem happens on every single Silicon Mechanics system we have, so at this point I''m really doubting it''s a hardware issue :) Hardware details are as follows: Silicon Mechanics Storform iServ R518 (Based on SuperMicro SC847E16-R1400 chassis) SuperMicro X8DT3 motherboard w/ onboard LSI1068 controller. - One LSI port goes to the front backplane (where the bulk of the SATA drives are, the two SSD''s used as ZIL/OS) - The other LSI port goes to the rear backplane where the two L2ARC drives are along with a couple SATA''s) We''ve got 6GB''s of RAM and 2 quad core Xeons in the box as well. The SSD''s themselves are all Intel X-25E''s (32GB) with firmware 8860 and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00). We''re running Solaris 10U8 mostly up to date and MPT HBA Driver v1.92. Thoughts, theories and conjectures would be much appreciated... Sun these days wants us to be able to reproduce the problem on Sun hardware to get much support... Silicon Mechanics has been helpful, but they don''t have a large enough inventory on hand to replicate our hardware setup it seems. :( Ray [1] http://markmail.org/message/gfz2cui2iua4dxpy
Andrew Gabriel
2010-Aug-24 23:46 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
Ray Van Dolson wrote:> I posted a thread on this once long ago[1] -- but we''re still fighting > with this problem and I wanted to throw it out here again. > > All of our hardware is from Silicon Mechanics (SuperMicro chassis and > motherboards). > > Up until now, all of the hardware has had a single 24-disk expander / > backplane -- but we recently got one of the new SC847-based models with > 24 disks up front and 12 in the back -- a dual backplane setup. > > We''re using two SSD''s in the front backplane as mirrored ZIL/OS (I > don''t think we have the 4K alignment set up correctly) and two drives > in the back as L2ARC. > > The rest of the disks are 1TB SATA disks which make up a single large > zpool via three 8-disk RAIDZ2''s. As you can see, we don''t have the > server maxed out on drives... > > In any case, this new server gets between 400 and 600 of these timeout > errors an hour: > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0/sd at 8,0 (sd0): > Aug 21 03:10:17 dev-zfs1 Error for Command: write(10) Error Level: Retryable > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Requested Block: 21230708 Error Block: 21230708 > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM002600EW > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > > iostat -xnMCez shows that the first of the two ZIL drives receives > about twice the number of "errors" as the second drive. > > There are no other errors on any other drives -- including the L2ARC > SSD''s and the ascv_t times seem reasonably low and don''t indicate a bad > drive to my eyes... > > The timeouts above exact a rather large performance penalty on the > system, both in IO and general usage from an SSH console. Obvious > pauses and glitches when accessing the filesystem. >This isn''t a timeout. "Unit Attention" is the drive saying back to the computer that it''s been reset and has forgotten any negotiation which happened with the controller. It''s a couple of decades since I was working on SCSI at this level, but IIRC, a drive will return "Unit Attention" error to the first command issued to it after a reset/powerup, except for a Test Unit Ready command. As it says, this might be caused by power on, reset, or bus reset occurred.> The problem _follows_ the ZIL and isn''t tied to hardware. IOW, if I > switch to using the L2ARC drives as ZIL, those drives suddenly exhibit > the timeout problems... >A possibility is that the problem is related to the nature of the load a ZIL drive attracts. One scenario could be that you are crashing the drive firmware, causing it it reset and reinitialize itself, and therefore to return "Unit Attention" to the next command. (I don''t know if X25-E''s can behave this way.) I would try and correct the 4k alignment on the ZIL at least - that does significantly affect the work the drive has to do internally (as well as its performance), although I''ve no idea if that''s related to the issue you''re seeing.> If we connect the SSD drives directly to the LSI controller instead of > hanging off the hot-swap backplane, the timeouts go away. >Again, may be related to some combination of the load type and physical characteristics.> If we use SSD''s attached to the SATA controllers as ZIL, there are also > no performance issues or timeout errors. >Why not do this then? It also avoids using SATA tunneling protocol across the SAS and port expanders.> So the problem only occurs with SSD drives acting as ZIL attached to > the backplane. > > This is leading me to believe we have a driver issue of some sort in > the mpt subsystem unable to cope with the longer command path of > multiple backplanes. Someone alluded to this in [1] as well, and it > makes sense to me. > > One quick fix to me would seem to be upping the SCSI timeout values. >The error you included isn''t a timeout.> The SSD''s themselves are all Intel X-25E''s (32GB) with firmware 8860 > and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00). >I''m not intimately familiar with the firmware versions, but if you''re having problems, making sure you have latest firmware is probably a good thing to do. -- Andrew Gabriel
Ray Van Dolson
2010-Aug-25 00:27 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
On Tue, Aug 24, 2010 at 04:46:23PM -0700, Andrew Gabriel wrote:> Ray Van Dolson wrote: > > I posted a thread on this once long ago[1] -- but we''re still fighting > > with this problem and I wanted to throw it out here again. > > > > All of our hardware is from Silicon Mechanics (SuperMicro chassis and > > motherboards). > > > > Up until now, all of the hardware has had a single 24-disk expander / > > backplane -- but we recently got one of the new SC847-based models with > > 24 disks up front and 12 in the back -- a dual backplane setup. > > > > We''re using two SSD''s in the front backplane as mirrored ZIL/OS (I > > don''t think we have the 4K alignment set up correctly) and two drives > > in the back as L2ARC. > > > > The rest of the disks are 1TB SATA disks which make up a single large > > zpool via three 8-disk RAIDZ2''s. As you can see, we don''t have the > > server maxed out on drives... > > > > In any case, this new server gets between 400 and 600 of these timeout > > errors an hour: > > > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > > Aug 21 03:10:17 dev-zfs1 Log info 31126000 received for target 8. > > Aug 21 03:10:17 dev-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0/sd at 8,0 (sd0): > > Aug 21 03:10:17 dev-zfs1 Error for Command: write(10) Error Level: Retryable > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Requested Block: 21230708 Error Block: 21230708 > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM002600EW > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention > > Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > > Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci15d9,1 at 0 (mpt0): > > > > iostat -xnMCez shows that the first of the two ZIL drives receives > > about twice the number of "errors" as the second drive. > > > > There are no other errors on any other drives -- including the L2ARC > > SSD''s and the ascv_t times seem reasonably low and don''t indicate a bad > > drive to my eyes... > > > > The timeouts above exact a rather large performance penalty on the > > system, both in IO and general usage from an SSH console. Obvious > > pauses and glitches when accessing the filesystem. > > > > This isn''t a timeout. "Unit Attention" is the drive saying back to the > computer that it''s been reset and has forgotten any negotiation which > happened with the controller. It''s a couple of decades since I was > working on SCSI at this level, but IIRC, a drive will return "Unit > Attention" error to the first command issued to it after a > reset/powerup, except for a Test Unit Ready command. As it says, this > might be caused by power on, reset, or bus reset occurred.Interesting. Thanks for the insight.> > > The problem _follows_ the ZIL and isn''t tied to hardware. IOW, if I > > switch to using the L2ARC drives as ZIL, those drives suddenly exhibit > > the timeout problems... > > > > A possibility is that the problem is related to the nature of the load a > ZIL drive attracts. One scenario could be that you are crashing the > drive firmware, causing it it reset and reinitialize itself, and > therefore to return "Unit Attention" to the next command. (I don''t know > if X25-E''s can behave this way.) > > I would try and correct the 4k alignment on the ZIL at least - that does > significantly affect the work the drive has to do internally (as well as > its performance), although I''ve no idea if that''s related to the issue > you''re seeing.Will definitely give this a go -- certainly can''t hurt.> > > If we connect the SSD drives directly to the LSI controller instead of > > hanging off the hot-swap backplane, the timeouts go away. > > > > Again, may be related to some combination of the load type and physical > characteristics. > > > If we use SSD''s attached to the SATA controllers as ZIL, there are also > > no performance issues or timeout errors. > > > > Why not do this then? It also avoids using SATA tunneling protocol > across the SAS and port expanders.We may -- however, the main reason we''d gone with the port expander was for convenient hot swappability. Though I guess SATA is technically hot swappable, it''s not as convenient :)> > > So the problem only occurs with SSD drives acting as ZIL attached to > > the backplane. > > > > This is leading me to believe we have a driver issue of some sort in > > the mpt subsystem unable to cope with the longer command path of > > multiple backplanes. Someone alluded to this in [1] as well, and it > > makes sense to me. > > > > One quick fix to me would seem to be upping the SCSI timeout values. > > > > The error you included isn''t a timeout. > > > The SSD''s themselves are all Intel X-25E''s (32GB) with firmware 8860 > > and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00). > > > > I''m not intimately familiar with the firmware versions, but if you''re > having problems, making sure you have latest firmware is probably a good > thing to do. >Appreciate the response Gabriel. We also do plan to compare between Solaris 10U8 and OpenSolaris / Nexenta at some point when this hardware is freed up... Ray
Andreas GrĂ¼ninger
2010-Aug-25 18:47 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
Ray Supermicro does not support the use of SSDs behind an expander. You must put the SSD in the head or use an interposer card see here: http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_sata_protocol_bridge/lsiss9252/index.html Supermicro offers an interposer card too: AOCSMPLSISS9252 . Andreas -- This message posted from opensolaris.org
Ray Van Dolson
2010-Aug-25 19:23 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
On Wed, Aug 25, 2010 at 11:47:38AM -0700, Andreas Gr?ninger wrote:> Ray > > Supermicro does not support the use of SSDs behind an expander. > > You must put the SSD in the head or use an interposer card see here: > http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_sata_protocol_bridge/lsiss9252/index.html > Supermicro offers an interposer card too: AOCSMPLSISS9252 . >Hmm, interesting. FAQ #3 on this page[1] seems to indicate otherwise -- at least in the case of the Intel X25-E (SSDSA2SH064G1GC) with firmware 8860 (which we are running). Ray [1] http://www.supermicro.com/support/faqs/results.cfm?id=95
Andreas GrĂ¼ninger
2010-Aug-25 20:47 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
This was the information I got from the distributor but this faq is newer. Anyway you have still the problems. When we installed the Intel-X25 we had also problems with timeout. We replaced the original SUN StorageTek SAS HBA (LSI based, 1068E, newest firmware) with an original SUN StorageTek SAS RAID HBA (SUN OEM version of Adaptec 5085). No timeouts since this replacement. Andreas -- This message posted from opensolaris.org
Ray Van Dolson
2010-Sep-21 15:58 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
Just wanted to post a quick follow-up to this. Original thread is here[1] -- not quoted for brevity. Andrew Gabriel suggested[2] that this could possibly be some workload triggered issue. We wanted to rule out a driver problem and so we tested various configurations under Solaris 10U9 and OpenSolaris with correct 4K block alignment. The "Unit Attention" errors under all operating environments for any X-25E (we haven''t tested other brands) when used as ZIL and attached to one of the LSI port expanders used in Silicon Mechanics hardware. As soon as we move the drives to the onboard SATA controller or directly attach to the LSI controller (bypassing the expander) the issues go away. Perhaps tweaking the firmware on the port expander would have resolved the issue, but we''re not able to test that scenario currently. Of note, heavy workload wasn''t required to trigger the problem. We ran bonnie++ hard on the system -- which appeared to tax the ZIL quite a bit, but got no errors. However, as soon as we set up an NFS VMware datastore and loaded a couple VM''s on it the Unit Attention errors began popping up -- even when they weren''t particularly busy. In any case, we''ll probably stop chasing our tails on this issue and will begin mounting all drives used for ZIL internally directly attached to the onboard SATA controllers. Thanks, Ray [1] http://mail.opensolaris.org/pipermail/zfs-discuss/2010-August/044362.html [2] http://mail.opensolaris.org/pipermail/zfs-discuss/2010-August/044364.html
Richard Elling
2010-Sep-21 16:15 UTC
[zfs-discuss] SCSI write retry errors on ZIL SSD drives...
Other SATA HDDs and SSDs are similarly affected. This can vary by firmware rev. The prescription seems to be consistent with my experience. -- richard On Sep 21, 2010, at 8:58 AM, Ray Van Dolson wrote:> Just wanted to post a quick follow-up to this. Original thread is > here[1] -- not quoted for brevity. > > Andrew Gabriel suggested[2] that this could possibly be some workload > triggered issue. We wanted to rule out a driver problem and so we > tested various configurations under Solaris 10U9 and OpenSolaris with > correct 4K block alignment. > > The "Unit Attention" errors under all operating environments for any > X-25E (we haven''t tested other brands) when used as ZIL and attached to > one of the LSI port expanders used in Silicon Mechanics hardware. > > As soon as we move the drives to the onboard SATA controller or > directly attach to the LSI controller (bypassing the expander) the > issues go away. > > Perhaps tweaking the firmware on the port expander would have resolved > the issue, but we''re not able to test that scenario currently. > > Of note, heavy workload wasn''t required to trigger the problem. We ran > bonnie++ hard on the system -- which appeared to tax the ZIL quite a > bit, but got no errors. > > However, as soon as we set up an NFS VMware datastore and loaded a > couple VM''s on it the Unit Attention errors began popping up -- even > when they weren''t particularly busy. > > In any case, we''ll probably stop chasing our tails on this issue and > will begin mounting all drives used for ZIL internally directly > attached to the onboard SATA controllers. > > Thanks, > Ray > > [1] http://mail.opensolaris.org/pipermail/zfs-discuss/2010-August/044362.html > [2] http://mail.opensolaris.org/pipermail/zfs-discuss/2010-August/044364.html > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com ZFS and performance consulting http://www.RichardElling.com