Weber, Markus
2012-Apr-05 08:48 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
Even though it''s not directly ZFS related, I''ve seen some similar discussion on this list and maybe someone has "the final" answer to this problem, as most tips and "these things could help" I have found so far have not fully solved the problem. We are struggling with the behaviour of the combination LSI 3081E-R and SATA disks behind an expander. One disk behind the expander is known to be bad. DDing from that disk causes I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure it will fail and make the system unusable. Under Linux after some time (maybe when certain things come together) a few LogInfo(0x31123000) will shortly interrupt I/O to other disks, but then more and more of these logs show up making any kind of I/O to disks behind the expander impossible. Under Solaris it doesn''t even take that long: reading once or twice from the bad disk, then I/O to other disks mostly immediately fail (and it looks like sometimes the HBA/bus(?) re-initialize completely). The error code 0x31123000 is SAS(3) + PL(1) + PL_LOGINFO_CODE_ABORT(12) + PL_LOGINFO_SUB_CODE_BREAK_ON_STUCK_LINK (3000) - I guess this relates to the HBA -- expander link(s) not being up/re-established/??? at that time and the HBA is maybe just not waiting long enough (but how long is long enough and not too long?)??? I''m trying to understand a bit better - why and who is triggering this (e.g. because mpt sends a reset bus/target) - what exactly is going on (e.g. a reset is sent, the HBA -- expander link takes too long or sees problems and then gets caught in a reset loop) - if this "as-per-design" (e.g. SATA disks behind expanders are always toxic) - if this problem could be pinpointed to one instance (like it''s the HBA''s FW or the expander FW) or a combination of things (like WD drives acts strange, causing problems with the expander or so). - any ideas to pinpoint the problem and get a clearer picture of the issue. I did some quick, preliminary other tests, which let me think, it''s most likely a "fatal" LSI3081--expander problem, but I could be wrong: - Moving the bad disk away from the expander to another port on the same HBA: When reading from the bad disk (not behind the expander), I/O to other disks (behind the expander) seems to be not affected at all. - Replacing the 3081 with a 9211, keeping the bad disk behind the expander: When reading from the bad disk, I/O to other disks seems to be shortly stopped, but continues quickly and no errors for the "good" disks are seen so far (at least under Solaris 11 booted from a LiveCD) - still not perfect, but better .. I do have an Oracle case on this, but -even though learned a few things- with no real result (it''s not Oracle HW). WD was so kind to provide quickly the latest FW for the drives, but not more so far and LSI is ... well, they take their time and gave as first reply "no, we are not aware of any issues like this" (strange, there are quite a bunch of postings about this out there). Many thanks for sharing your experience or ideas on this. Markus PS: LSI 3081E-R (SAS1068E B3), running 01.33.00.00 / 6.36.00.00; expander backplane is SC216-E1 (SASX36 A1 7015) and WD3000BLFS FW 04.04V06. Solaris 10 & 11, OpenSolaris 134, OpenIndina 151a, CentOS 6.2 with 3.04.19 MPT, OpenSuse with 11.1 & 4.00.43.00suse MPT and/or latest LSI drivers 4.28.00.00 ... -- KPN International Darmst?dter Landstrasse 184 | 60598 Frankfurt | Germany [T] +49 (0)69 96874-298 | [F] -289 | [M] +49 (0)178 5352346 [E] <Markus.Weber at kpn.DE> | [W] www.kpn.de KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V. KPN Eurorings B.V. | Niederlassung Frankfurt am Main Amtsgericht Frankfurt HRB56874 | USt.IdNr. DE 225602449 Gesch?ftsf?hrer Jacobus Snijder & Louis Rustenhoven
Hung-Sheng Tsaio (Lao Tsao) Ph.D.
2012-Apr-05 09:51 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
hi I donot have answer but just want to mention ZFS appliance move away from SATA to SAS with expander for a reason IIRC ZFS appliance based on SATA has one biggest problem, one SATA HDD will cause the whole array down regards On 4/5/2012 4:48 AM, Weber, Markus wrote:> Even though it''s not directly ZFS related, I''ve seen some similar discussion > on this list and maybe someone has "the final" answer to this problem, as most > tips and "these things could help" I have found so far have not fully solved > the problem. > > > We are struggling with the behaviour of the combination LSI 3081E-R and SATA > disks behind an expander. > > One disk behind the expander is known to be bad. DDing from that disk causes > I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure > it will fail and make the system unusable. > > Under Linux after some time (maybe when certain things come together) a few > LogInfo(0x31123000) will shortly interrupt I/O to other disks, but then more > and more of these logs show up making any kind of I/O to disks behind the > expander impossible. > > Under Solaris it doesn''t even take that long: reading once or twice from the > bad disk, then I/O to other disks mostly immediately fail (and it looks like > sometimes the HBA/bus(?) re-initialize completely). > > > The error code 0x31123000 is SAS(3) + PL(1) + PL_LOGINFO_CODE_ABORT(12) + > PL_LOGINFO_SUB_CODE_BREAK_ON_STUCK_LINK (3000) - I guess this relates to the > HBA -- expander link(s) not being up/re-established/??? at that time and the > HBA is maybe just not waiting long enough (but how long is long enough and > not too long?)??? > > > I''m trying to understand a bit better > > - why and who is triggering this (e.g. because mpt sends a reset bus/target) > > - what exactly is going on (e.g. a reset is sent, the HBA -- expander link takes > too long or sees problems and then gets caught in a reset loop) > > - if this "as-per-design" (e.g. SATA disks behind expanders are always toxic) > > - if this problem could be pinpointed to one instance (like it''s the HBA''s FW > or the expander FW) or a combination of things (like WD drives acts strange, > causing problems with the expander or so). > > - any ideas to pinpoint the problem and get a clearer picture of the issue. > > > I did some quick, preliminary other tests, which let me think, it''s most > likely a "fatal" LSI3081--expander problem, but I could be wrong: > > - Moving the bad disk away from the expander to another port on the same HBA: > When reading from the bad disk (not behind the expander), I/O to other disks > (behind the expander) seems to be not affected at all. > > - Replacing the 3081 with a 9211, keeping the bad disk behind the expander: > When reading from the bad disk, I/O to other disks seems to be shortly stopped, > but continues quickly and no errors for the "good" disks are seen so far (at > least under Solaris 11 booted from a LiveCD) - still not perfect, but better .. > > > > I do have an Oracle case on this, but -even though learned a few things- > with no real result (it''s not Oracle HW). WD was so kind to provide quickly > the latest FW for the drives, but not more so far and LSI is ... well, they > take their time and gave as first reply "no, we are not aware of any issues > like this" (strange, there are quite a bunch of postings about this out > there). > > > Many thanks for sharing your experience or ideas on this. > Markus > > > > PS: > > LSI 3081E-R (SAS1068E B3), running 01.33.00.00 / 6.36.00.00; expander backplane > is SC216-E1 (SASX36 A1 7015) and WD3000BLFS FW 04.04V06. > > Solaris 10& 11, OpenSolaris 134, OpenIndina 151a, CentOS 6.2 with 3.04.19 MPT, > OpenSuse with 11.1& 4.00.43.00suse MPT and/or latest LSI drivers 4.28.00.00 ... > >-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 464 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120405/b163f660/attachment.vcf>
Paul Kraus
2012-Apr-05 12:12 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
On Thu, Apr 5, 2012 at 4:48 AM, Weber, Markus <fvd at de.kpn-eurorings.net> wrote:> Even though it''s not directly ZFS related, I''ve seen some similar discussion > on this list and maybe someone has "the final" answer to this problem, as most > tips and "these things could help" I have found so far have not fully solved > the problem. > > > We are struggling with the behaviour of the combination LSI 3081E-R and SATA > disks behind an expander. > > One disk behind the expander is known to be bad. DDing from that disk causes > I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure > it will fail and make the system unusable.<snip> We have five J4400 loaded with SATA drives connected to two dual port 1068E based controllers. A fully supported configuration as of when we bought it. We also two J4400 loaded with SATA drives behind a single dual port 1068E based controller. We also have three instances of a single J4400 behind a dual port 1068E controller. In all cases when I say "dual port" I mean dual external SAS connector, each with 4 channels, so a total of 8 channels per controller. All J4400 are dual attached and we are running Solaris 10U9 with multi-pathing enabled. I have not seen any odd issues with the five J4400 configuration since we went production. In pre-production testing we found a bug in the MPT driver that would cause a failed dead drive go undetected for _hours_ while zfs blindly trusted the FMD layer and kept issuing I/O requests and waiting for responses that were never coming back. This was fixed in an IDR (which we are running) but has been fully integrated in 10U10. I have seen odd behavior of the single J4400 configurations when a drive fails. I have not been able to really qualify the problem, just very slow I/O and no logs to point at anything other than the single failed drive. Sometimes reseating the failed drive will make it come back to life, sometimes for a short while sometimes (apparently) permanently. I have not seen any odd behavior due to the J4400 in the two J4400 configuration (we have had other issues with this system, but they were not related to the J4400). No data has been lost due to any of the failures or outages. Thank you ZFS. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Troy Civic Theatre Company -> Technical Advisor, RPI Players
Weber, Markus
2012-Apr-05 14:04 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
Paul wrote:> I have not seen any odd issues with the five J4400 configuration > since we went production.I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- Interposer cards and thus handling the SATA drives more or less like SAS ones? Markus -- KPN International Darmst?dter Landstrasse 184 | 60598 Frankfurt | Germany [T] +49 (0)69 96874-298 | [F] -289 | [M] +49 (0)178 5352346 [E] <Markus.Weber at kpn.DE> | [W] www.kpn.de KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V. KPN Eurorings B.V. | Niederlassung Frankfurt am Main Amtsgericht Frankfurt HRB56874 | USt.IdNr. DE 225602449 Gesch?ftsf?hrer Jacobus Snijder & Louis Rustenhoven
Paul Kraus
2012-Apr-05 14:22 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
On Thu, Apr 5, 2012 at 10:04 AM, Weber, Markus <fvd at de.kpn-eurorings.net> wrote:> Paul wrote: >> ? ?I have not seen any odd issues with the five J4400 configuration >> since we went production. > > I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- > Interposer cards and thus handling the SATA drives more or less like SAS ones?I believe so and Oracle has pulled the SATA configurations from what you can buy today. I''m not sure it is really valid to say that a SATA drive with an interposer is like a SAS drive, they do take a different path through the MPT code, so there are differences. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Troy Civic Theatre Company -> Technical Advisor, RPI Players
Hung-Sheng Tsaio (Lao Tsao) Ph.D.
2012-Apr-05 14:32 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
On 4/5/2012 10:22 AM, Paul Kraus wrote:> On Thu, Apr 5, 2012 at 10:04 AM, Weber, Markus<fvd at de.kpn-eurorings.net> wrote: >> Paul wrote: >>> I have not seen any odd issues with the five J4400 configuration >>> since we went production. >> I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- >> Interposer cards and thus handling the SATA drives more or less like SAS ones?J4400 is the 1st generation SAS-1 JBOD that using LSI based SAS HBA, you can read about it on the 7xxx unified storage Current generation is SAS-2 based oracle only sale them in ZFS appliance now it is all SAS based 7200rpm please read the ZFS appliance doc just google SAS vs SATA you will find the difference regards> I believe so and Oracle has pulled the SATA configurations from > what you can buy today. I''m not sure it is really valid to say that a > SATA drive with an interposer is like a SAS drive, they do take a > different path through the MPT code, so there are differences. >-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 483 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120405/674bb20f/attachment.vcf>
Rocky Shek
2012-Apr-05 21:12 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
J4400 is using LSI/SiliconStor SS1320 Interposer cards to handle SATA HD With past experience, we have better luck with Hitachi SATA then WD SATA HD. Sun was using Hitachi HD, too. If you have me to choose, 7200 RPM SAS HD is still the best choice. Most of our customers use pure SAS HD on our JBOD. Rocky -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Weber, Markus Sent: Thursday, April 05, 2012 7:05 AM To: Paul Kraus; ZFS Discussions Subject: Re: [zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk? Paul wrote:> I have not seen any odd issues with the five J4400 configuration > since we went production.I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- Interposer cards and thus handling the SATA drives more or less like SAS ones? Markus -- KPN International Darmst?dter Landstrasse 184 | 60598 Frankfurt | Germany [T] +49 (0)69 96874-298 | [F] -289 | [M] +49 (0)178 5352346 [E] <Markus.Weber at kpn.DE> | [W] www.kpn.de KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V. KPN Eurorings B.V. | Niederlassung Frankfurt am Main Amtsgericht Frankfurt HRB56874 | USt.IdNr. DE 225602449 Gesch?ftsf?hrer Jacobus Snijder & Louis Rustenhoven _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Klimov
2012-Apr-07 23:15 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
> I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- > Interposer cards and thus handling the SATA drives more or less like SAS ones?Out of curiosity, are there any third-party hardware vendors that make server/storage chassis (Supermicro et al) who make SATA backplanes with the SAS interposers soldered on? Would that make sense, or be cheaper/more reliable than having extra junk between the disk and backplane connectors? (if I correctly understand what the talk is about? ;) ZFS was very attractive at first because of the claim that "it returns Inexpensive into raId" and can do miracles with SATA disks. Reality has shown to many of us that many SATA implementations existing in the wild should be avoided... so we''re back to good vendors'' higher end expensive SATAs or better yet SAS drives. Not inexpensive anymore again :( Thanks, //Jim
Richard Elling
2012-Apr-08 02:06 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
On Apr 7, 2012, at 4:15 PM, Jim Klimov wrote:>> I''m not familiar with the J4400 at all, but isn''t Sun/Oracle using -like NetAPP- >> Interposer cards and thus handling the SATA drives more or less like SAS ones? > > Out of curiosity, are there any third-party hardware vendors > that make server/storage chassis (Supermicro et al) who make > SATA backplanes with the SAS interposers soldered on?None AFIAK.> > Would that make sense, or be cheaper/more reliable than > having extra junk between the disk and backplane connectors? > (if I correctly understand what the talk is about? ;)It would not be more reliable or cheaper.> ZFS was very attractive at first because of the claim that > "it returns Inexpensive into raId" and can do miracles > with SATA disks. Reality has shown to many of us that > many SATA implementations existing in the wild should > be avoided... so we''re back to good vendors'' higher end > expensive SATAs or better yet SAS drives. Not inexpensive > anymore again :(You can''t get past the age-old idiom: you get what you pay for. In some cases today, depending on vendor, the cost of SATA + interposer is the same as SAS. -- richard -- DTrace Conference, April 3, 2012, http://wiki.smartos.org/display/DOC/dtrace.conf ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120407/838144ef/attachment.html>
Jim Klimov
2012-Apr-08 11:40 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA disk?
2012-04-08 6:06, Richard Elling wrote:> You can''t get past the age-old idiom: you get what you pay for.True... but it can be somewhat countered with DrHouse-age idiom: people lie, even if they don''t mean to ;) Rhetoric foolows ;) Hardware breaks sooner or later, due to poor design, brownian movement of the IC''s atoms, or a flock of space deaath rays. So in the extreme case software built for reliability should assume that nothing is like it seems or is reported by the hardware. In this case the 10^-14 or 10^-16 BER, or the fact that most of the time expensive disks complete the flush requests while cheaper ones likely don''t - that is just a change of non-zero factors in probability of having an actual error and data loss. Even the hashes have miniscule non-zero chances of collision. This is reminiscent of the approach in building dedicated network segments for higher-security tasks - a networked device (especially ones connected to "the outside") are assumed to have been hacked into. If not yet, then there will be some zero-day exploit and it will be hacked into. It is understandable that X%-reliable systems can probably be built easier if you start with more reliable components, but they are not infinitely better that "unreliable" ones. So, is there really a fundamental requirement to avoid cheap hardware, and are there no good ways to work around its inherently higher instability and lack of dependability? Or is it just a harder goal (indefinitely far away on the project roadmap)? Just a thought... //Jim