thr3ads.net - zfs discuss - [zfs-discuss] LSI 3GB HBA SAS Errors (and other misc) [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Ryan Wehler

2011-Dec-02 01:08 UTC

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

During the diagnostics of my SAN failure last week we thought we had seen a
backplane failure due to high error counts with ''lsiutil''.
However, even with a new backplane and ruling out failed cards (MPXIO or
singular) or bad cables I''m still seeing my error count with LSIUTIL
increment. I''ve got no disks attached to the array right now so
I''ve also ruled those out.

Even with nothing connected but the HBA to the backplane expander, a simple
restart of the SAN into a OpenIndiana LiveCD or other distribution (NexentaStor)
increments the counter.

I''ve been as careful as I can be to clear the counter between changes
to parts to try and eliminate a potentially bad cable/card/etc. You can see phy
8-15 throw errors irregardless of MPXIO or single card config, OR which expander
port I use on the backplane.

According to my VAR something in the mptsas code changed "recently"
(not sure what that means in time terms) and they do not see the problems with
6GB backplanes and adapters.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SAS Diags.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111201/0c9932e4/attachment-0001.txt>
-------------- next part --------------

Attached is a log I took through NexentaStor 3.1.1 with my disks still attached.
The disks themselves don''t seem to be throwing errors, so
that''s good.

Has anyone seen anything like this? I have not tried to boot into an older
version of Solaris or NexentaStor yet, but booting into Scientific Linux 6.1
yields about the same results with lsiutil.

Nothing from fmadm, /var/adm/messages or otherwise indicate these data errors
outside of lsiutil.

Richard Elling

2011-Dec-04 03:03 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
> During the diagnostics of my SAN failure last week we thought we had seen a
backplane failure due to high error counts with ''lsiutil''. 
However, even with a new backplane and ruling out failed cards (MPXIO or
singular) or bad cables I''m still seeing my error count with LSIUTIL
increment.  I''ve got no disks attached to the array right now so
I''ve also ruled those out.
The link error counters are on the receiving side. To see the complete picture,
you need to look at
link errors on both ends of each link (more below?)
> 
> Even with nothing connected but the HBA to the backplane expander, a simple
restart of the SAN into a OpenIndiana LiveCD or other distribution (NexentaStor)
increments the counter.
A few counters can tick up when the system is reset at boot. These can be
ignored.

What you are looking for is  a consistent increase of the  counters under load.
In some cases
I have seen millions of errors per minute on a very unhappy system.
> I''ve been as careful as I can be to clear the counter between
changes to parts to try and eliminate a potentially bad cable/card/etc.  You can
see phy 8-15 throw errors irregardless of MPXIO or single card config, OR which
expander port I use on the backplane.
The info you attaced doesn''t show the topology (lsiutil command 16), so
it is difficult to say
why this occurs.
> 
> According to my VAR something in the mptsas code changed
"recently" (not sure what that means in time terms) and they do not
see the problems with 6GB backplanes and adapters.
These counters are in the physical interfaces, far away from any OS.
> 
> <SAS Diags.txt>
> 
> 
> Attached is a log I took through NexentaStor 3.1.1 with my disks still
attached.  The disks themselves don''t seem to be throwing errors, so
that''s good.
To see errors from the disk''s perspective, you need to look at the
disk''s logs.
I use sg3 utils for this (sg_logs -a /dev/rdsk/...)
> 
> 
> Has anyone seen anything like this?  I have not tried to boot into an older
version of Solaris or NexentaStor yet, but booting into Scientific Linux 6.1
yields about the same results with lsiutil.
Yes. Root cause is always hardware.
> 
> Nothing from fmadm, /var/adm/messages or otherwise indicate these data
errors outside of lsiutil.
Those errors are counters as part of the SAS link state machine. The symptoms
will show as
poor performance or occasional command resets at the OS level.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

Ryan Wehler

2011-Dec-04 03:36 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

Hi Richard,
  Thanks for getting back to me.


On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
> 
>> During the diagnostics of my SAN failure last week we thought we had
seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
> 
> The link error counters are on the receiving side. To see the complete
picture, you need to look at
> link errors on both ends of each link (more below?)
> 
>> 
>> Even with nothing connected but the HBA to the backplane expander, a
simple restart of the SAN into a OpenIndiana LiveCD or other distribution
(NexentaStor) increments the counter.
> 
> A few counters can tick up when the system is reset at boot. These can be
ignored.
> 
> What you are looking for is  a consistent increase of the  counters under
load. In some cases
> I have seen millions of errors per minute on a very unhappy system.
But we''re talking about 600,000 -> 2,000,000 errors on a simple
reset at boot.  Per my VAR their 6GB hardware show significantly less (in the
10s to 100s of errors, not 100s to millions).
> 
>> I''ve been as careful as I can be to clear the counter between
changes to parts to try and eliminate a potentially bad cable/card/etc.  You can
see phy 8-15 throw errors irregardless of MPXIO or single card config, OR which
expander port I use on the backplane.
> 
> The info you attaced doesn''t show the topology (lsiutil command
16), so it is difficult to say
> why this occurs.
Attached is the output of option 16 on each card.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: LSI1068.rtf
Type: text/rtf
Size: 17221 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111203/1206ed14/attachment.bin>
-------------- next part --------------

> 
>> 
>> According to my VAR something in the mptsas code changed
"recently" (not sure what that means in time terms) and they do not
see the problems with 6GB backplanes and adapters.
> 
> These counters are in the physical interfaces, far away from any OS.
> 
>> 
>> <SAS Diags.txt>
>> 
>> 
>> Attached is a log I took through NexentaStor 3.1.1 with my disks still
attached.  The disks themselves don''t seem to be throwing errors, so
that''s good.
> 
> To see errors from the disk''s perspective, you need to look at the
disk''s logs.
> I use sg3 utils for this (sg_logs -a /dev/rdsk/...)
> 
I''d paste some of this, but the output would be pretty big.  :)
I''ll look more into this. Though my "errors corrected without
substantial delay" stands out as pretty high, even on a new disk I just
received. Is there anything specific I should be looking at?

>> 
>> 
>> Has anyone seen anything like this?  I have not tried to boot into an
older version of Solaris or NexentaStor yet, but booting into Scientific Linux
6.1 yields about the same results with lsiutil.
> 
> Yes. Root cause is always hardware.
> 
>> 
>> Nothing from fmadm, /var/adm/messages or otherwise indicate these data
errors outside of lsiutil.
> 
> Those errors are counters as part of the SAS link state machine. The
symptoms will show as
> poor performance or occasional command resets at the OS level.
> -- richard
> 
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> LISA ''11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Richard Elling

2011-Dec-04 04:31 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
> Hi Richard,
>  Thanks for getting back to me.
> 
> 
> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
> 
>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>> 
>>> During the diagnostics of my SAN failure last week we thought we
had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>> 
>> The link error counters are on the receiving side. To see the complete
picture, you need to look at
>> link errors on both ends of each link (more below?)
>> 
>>> 
>>> Even with nothing connected but the HBA to the backplane expander,
a simple restart of the SAN into a OpenIndiana LiveCD or other distribution
(NexentaStor) increments the counter.
>> 
>> A few counters can tick up when the system is reset at boot. These can
be ignored.
>> 
>> What you are looking for is  a consistent increase of the  counters
under load. In some cases
>> I have seen millions of errors per minute on a very unhappy system.
> 
> But we''re talking about 600,000 -> 2,000,000 errors on a simple
reset at boot.  Per my VAR their 6GB hardware show significantly less (in the
10s to 100s of errors, not 100s to millions).
For high-quality hardware, I see 4 to 8.  If I see > 1,000, then I start
replacing hardware.
>>> I''ve been as careful as I can be to clear the counter
between changes to parts to try and eliminate a potentially bad cable/card/etc. 
You can see phy 8-15 throw errors irregardless of MPXIO or single card config,
OR which expander port I use on the backplane.
>> 
>> The info you attaced doesn''t show the topology (lsiutil
command 16), so it is difficult to say
>> why this occurs.
> 
> Attached is the output of option 16 on each card.
> 
> <LSI1068.rtf>
This shows that the handle 0009 phys 12 to 15 are the other HBA (initiator).

It is unusual to see millions of errors there.

Also, the number of errors is not symmetrical. From the HBA (Adapter phy 1)
you see on the order of  thousand errors. From the expander (handle 0009)
you see millions of errors on phys 12 to 15, that are connected to the HBA.

Also interesting is that one of the phys, adapter phy 0, shows no errors, but we
see
errors on the others. This is unusual because there are 4 links in the cable.

Still smells like hardware to me.
 -- richard
>>> 
>>> According to my VAR something in the mptsas code changed
"recently" (not sure what that means in time terms) and they do not
see the problems with 6GB backplanes and adapters.
>> 
>> These counters are in the physical interfaces, far away from any OS.
>> 
>>> 
>>> <SAS Diags.txt>
>>> 
>>> 
>>> Attached is a log I took through NexentaStor 3.1.1 with my disks
still attached.  The disks themselves don''t seem to be throwing errors,
so that''s good.
>> 
>> To see errors from the disk''s perspective, you need to look at
the disk''s logs.
>> I use sg3 utils for this (sg_logs -a /dev/rdsk/...)
>> 
> 
> I''d paste some of this, but the output would be pretty big.  :)
I''ll look more into this. Though my "errors corrected without
substantial delay" stands out as pretty high, even on a new disk I just
received. Is there anything specific I should be looking at?
> 
> 
>>> 
>>> 
>>> Has anyone seen anything like this?  I have not tried to boot into
an older version of Solaris or NexentaStor yet, but booting into Scientific
Linux 6.1 yields about the same results with lsiutil.
>> 
>> Yes. Root cause is always hardware.
>> 
>>> 
>>> Nothing from fmadm, /var/adm/messages or otherwise indicate these
data errors outside of lsiutil.
>> 
>> Those errors are counters as part of the SAS link state machine. The
symptoms will show as
>> poor performance or occasional command resets at the OS level.
>> -- richard
>> 
>> -- 
>> 
>> ZFS and performance consulting
>> http://www.RichardElling.com
>> LISA ''11, Boston, MA, December 4-9 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

Ryan Wehler

2011-Dec-04 05:02 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 10:31 PM, Richard Elling wrote:
> On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
> 
>> Hi Richard,
>> Thanks for getting back to me.
>> 
>> 
>> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
>> 
>>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>>> 
>>>> During the diagnostics of my SAN failure last week we thought
we had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>>> 
>>> The link error counters are on the receiving side. To see the
complete picture, you need to look at
>>> link errors on both ends of each link (more below?)
>>> 
>>>> 
>>>> Even with nothing connected but the HBA to the backplane
expander, a simple restart of the SAN into a OpenIndiana LiveCD or other
distribution (NexentaStor) increments the counter.
>>> 
>>> A few counters can tick up when the system is reset at boot. These
can be ignored.
>>> 
>>> What you are looking for is  a consistent increase of the  counters
under load. In some cases
>>> I have seen millions of errors per minute on a very unhappy system.
>> 
>> But we''re talking about 600,000 -> 2,000,000 errors on a
simple reset at boot.  Per my VAR their 6GB hardware show significantly less (in
the 10s to 100s of errors, not 100s to millions).
> 
> For high-quality hardware, I see 4 to 8.  If I see > 1,000, then I start
replacing hardware.

And how do you define "high quality hardware"?  Obviously these
aren''t crummy SATA adapters and low cost drives.  The Chassis and
backplane are on Nexenta''s HSL.  While the cards are not, explicitly
listed. The underlying chip (LSI 1068) is on another card (3081E-R) that is on
the HSL.
> 
>>>> I''ve been as careful as I can be to clear the counter
between changes to parts to try and eliminate a potentially bad cable/card/etc. 
You can see phy 8-15 throw errors irregardless of MPXIO or single card config,
OR which expander port I use on the backplane.
>>> 
>>> The info you attaced doesn''t show the topology (lsiutil
command 16), so it is difficult to say
>>> why this occurs.
>> 
>> Attached is the output of option 16 on each card.
>> 
>> <LSI1068.rtf>
> 
> This shows that the handle 0009 phys 12 to 15 are the other HBA
(initiator).
> 
> It is unusual to see millions of errors there.
> 
> Also, the number of errors is not symmetrical. From the HBA (Adapter phy 1)
> you see on the order of  thousand errors. From the expander (handle 0009)
> you see millions of errors on phys 12 to 15, that are connected to the HBA.
> 
> Also interesting is that one of the phys, adapter phy 0, shows no errors,
but we see
> errors on the others. This is unusual because there are 4 links in the
cable.
> 
> Still smells like hardware to me.
> -- richard
> 
I''m not quite extrapolating this data like you are.  I see handle 0009
which looks to be the expander.  Card #1 is hooked to phy 8-11 and Card #2 is
hooked to phy 12-15.  (port 0 and 1 on the expander)

As far as symmetrical errors, yeah the whole thing is screwy. The one thing I am
seeing as stand out that I did not notice before for some reason is that
"right card" (the one that normally handles phy 12-15) in my previous
output from my initial inquiry carries 1+M errors on the expander phys
regardless of the "right or left" cable.  Perhaps that is an indicator
of hardware malfunction. The "left" card (usually responsible for phy
8-11) throws something in the order of 600+K (under 1M) using "right or
left" cable (phy 8-11 or 12-15).  Those numbers are uncomfortably high too,
though.

Basically the output of my SAS Diag.txt was flipping between single use of each
card with each of the two cables I had available to me.  If I were to show the
output now with both cards enabled phy 8-15 on the expander all show "link
up" situation.

The other mystery as you mentioned is why Adapter phy 0 is error free while the
other 3 phys are not. It''s also persistent across cables used AND cards
used.
>>>> 
>>>> According to my VAR something in the mptsas code changed
"recently" (not sure what that means in time terms) and they do not
see the problems with 6GB backplanes and adapters.
>>> 
>>> These counters are in the physical interfaces, far away from any
OS.
>>> 
>>>> 
>>>> <SAS Diags.txt>
>>>> 
>>>> 
>>>> Attached is a log I took through NexentaStor 3.1.1 with my
disks still attached.  The disks themselves don''t seem to be throwing
errors, so that''s good.
>>> 
>>> To see errors from the disk''s perspective, you need to
look at the disk''s logs.
>>> I use sg3 utils for this (sg_logs -a /dev/rdsk/...)
>>> 
>> 
>> I''d paste some of this, but the output would be pretty big. 
:) I''ll look more into this. Though my "errors corrected without
substantial delay" stands out as pretty high, even on a new disk I just
received. Is there anything specific I should be looking at?
>> 
>> 
>>>> 
>>>> 
>>>> Has anyone seen anything like this?  I have not tried to boot
into an older version of Solaris or NexentaStor yet, but booting into Scientific
Linux 6.1 yields about the same results with lsiutil.
>>> 
>>> Yes. Root cause is always hardware.
>>> 
>>>> 
>>>> Nothing from fmadm, /var/adm/messages or otherwise indicate
these data errors outside of lsiutil.
>>> 
>>> Those errors are counters as part of the SAS link state machine.
The symptoms will show as
>>> poor performance or occasional command resets at the OS level.
>>> -- richard
>>> 
>>> -- 
>>> 
>>> ZFS and performance consulting
>>> http://www.RichardElling.com
>>> LISA ''11, Boston, MA, December 4-9 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> LISA ''11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Richard Elling

2011-Dec-04 05:18 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 9:02 PM, Ryan Wehler wrote:> 
> On Dec 3, 2011, at 10:31 PM, Richard Elling wrote:
> 
>> On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
>> 
>>> Hi Richard,
>>> Thanks for getting back to me.
>>> 
>>> 
>>> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
>>> 
>>>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>>>> 
>>>>> During the diagnostics of my SAN failure last week we
thought we had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>>>> 
>>>> The link error counters are on the receiving side. To see the
complete picture, you need to look at
>>>> link errors on both ends of each link (more below?)
>>>> 
>>>>> 
>>>>> Even with nothing connected but the HBA to the backplane
expander, a simple restart of the SAN into a OpenIndiana LiveCD or other
distribution (NexentaStor) increments the counter.
>>>> 
>>>> A few counters can tick up when the system is reset at boot.
These can be ignored.
>>>> 
>>>> What you are looking for is  a consistent increase of the 
counters under load. In some cases
>>>> I have seen millions of errors per minute on a very unhappy
system.
>>> 
>>> But we''re talking about 600,000 -> 2,000,000 errors on
a simple reset at boot.  Per my VAR their 6GB hardware show significantly less
(in the 10s to 100s of errors, not 100s to millions).
>> 
>> For high-quality hardware, I see 4 to 8.  If I see > 1,000, then I
start replacing hardware.
> 
> 
> And how do you define "high quality hardware"?  Obviously these
aren''t crummy SATA adapters and low cost drives.  The Chassis and
backplane are on Nexenta''s HSL.  While the cards are not, explicitly
listed. The underlying chip (LSI 1068) is on another card (3081E-R) that is on
the HSL.
I recently tested a HP DL380 G7 with D2600 and D2700 JBOD chassis. Zero errors.

Currently, the test process for HSL records any errors, but as long as the root
cause can be
explained, the devices can pass certification.
>>>>> I''ve been as careful as I can be to clear the
counter between changes to parts to try and eliminate a potentially bad
cable/card/etc.  You can see phy 8-15 throw errors irregardless of MPXIO or
single card config, OR which expander port I use on the backplane.
>>>> 
>>>> The info you attaced doesn''t show the topology
(lsiutil command 16), so it is difficult to say
>>>> why this occurs.
>>> 
>>> Attached is the output of option 16 on each card.
>>> 
>>> <LSI1068.rtf>
>> 
>> This shows that the handle 0009 phys 12 to 15 are the other HBA
(initiator).
>> 
>> It is unusual to see millions of errors there.
>> 
>> Also, the number of errors is not symmetrical. From the HBA (Adapter
phy 1)
>> you see on the order of  thousand errors. From the expander (handle
0009)
>> you see millions of errors on phys 12 to 15, that are connected to the
HBA.
>> 
>> Also interesting is that one of the phys, adapter phy 0, shows no
errors, but we see
>> errors on the others. This is unusual because there are 4 links in the
cable.
>> 
>> Still smells like hardware to me.
>> -- richard
>> 
> 
> I''m not quite extrapolating this data like you are.  I see handle
0009 which looks to be the expander.  Card #1 is hooked to phy 8-11 and Card #2
is hooked to phy 12-15.  (port 0 and 1 on the expander)
> 
> As far as symmetrical errors, yeah the whole thing is screwy. The one thing
I am seeing as stand out that I did not notice before for some reason is that
"right card" (the one that normally handles phy 12-15) in my previous
output from my initial inquiry carries 1+M errors on the expander phys
regardless of the "right or left" cable.  Perhaps that is an indicator
of hardware malfunction. The "left" card (usually responsible for phy
8-11) throws something in the order of 600+K (under 1M) using "right or
left" cable (phy 8-11 or 12-15).  Those numbers are uncomfortably high too,
though.
Agree.
> Basically the output of my SAS Diag.txt was flipping between single use of
each card with each of the two cables I had available to me.  If I were to show
the output now with both cards enabled phy 8-15 on the expander all show
"link up" situation.
Are the cables of the same make/model? Unfortunately, it is not uncommon to see
bad cables :-(
I had one just last week :-(
> The other mystery as you mentioned is why Adapter phy 0 is error free while
the other 3 phys are not. It''s also persistent across cables used AND
cards used.
A mystery?
 -- richard
-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

Ryan Wehler

2011-Dec-04 05:32 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 11:18 PM, Richard Elling wrote:
> On Dec 3, 2011, at 9:02 PM, Ryan Wehler wrote:
>> 
>> On Dec 3, 2011, at 10:31 PM, Richard Elling wrote:
>> 
>>> On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
>>> 
>>>> Hi Richard,
>>>> Thanks for getting back to me.
>>>> 
>>>> 
>>>> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
>>>> 
>>>>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>>>>> 
>>>>>> During the diagnostics of my SAN failure last week we
thought we had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>>>>> 
>>>>> The link error counters are on the receiving side. To see
the complete picture, you need to look at
>>>>> link errors on both ends of each link (more below?)
>>>>> 
>>>>>> 
>>>>>> Even with nothing connected but the HBA to the
backplane expander, a simple restart of the SAN into a OpenIndiana LiveCD or
other distribution (NexentaStor) increments the counter.
>>>>> 
>>>>> A few counters can tick up when the system is reset at
boot. These can be ignored.
>>>>> 
>>>>> What you are looking for is  a consistent increase of the 
counters under load. In some cases
>>>>> I have seen millions of errors per minute on a very unhappy
system.
>>>> 
>>>> But we''re talking about 600,000 -> 2,000,000 errors
on a simple reset at boot.  Per my VAR their 6GB hardware show significantly
less (in the 10s to 100s of errors, not 100s to millions).
>>> 
>>> For high-quality hardware, I see 4 to 8.  If I see > 1,000, then
I start replacing hardware.
>> 
>> 
>> And how do you define "high quality hardware"?  Obviously
these aren''t crummy SATA adapters and low cost drives.  The Chassis and
backplane are on Nexenta''s HSL.  While the cards are not, explicitly
listed. The underlying chip (LSI 1068) is on another card (3081E-R) that is on
the HSL.
> 
> I recently tested a HP DL380 G7 with D2600 and D2700 JBOD chassis. Zero
errors.
I''m assuming these had some sort of LSI cards in them since
that''s the primary focus here.  Do you happen to know models and what
expander chip was used on the backplane(s)?
> Currently, the test process for HSL records any errors, but as long as the
root cause can be
> explained, the devices can pass certification.
Well.... since we can''t even come to a reasonable justification on why
these errors exist with no "true" indicator of bad hardware, something
like this could pass the HSL if the VAR can justify it?  I''m not saying
thats what happened.. I''m just trying to understand the process.

> 
>>>>>> I''ve been as careful as I can be to clear the
counter between changes to parts to try and eliminate a potentially bad
cable/card/etc.  You can see phy 8-15 throw errors irregardless of MPXIO or
single card config, OR which expander port I use on the backplane.
>>>>> 
>>>>> The info you attaced doesn''t show the topology
(lsiutil command 16), so it is difficult to say
>>>>> why this occurs.
>>>> 
>>>> Attached is the output of option 16 on each card.
>>>> 
>>>> <LSI1068.rtf>
>>> 
>>> This shows that the handle 0009 phys 12 to 15 are the other HBA
(initiator).
>>> 
>>> It is unusual to see millions of errors there.
>>> 
>>> Also, the number of errors is not symmetrical. From the HBA
(Adapter phy 1)
>>> you see on the order of  thousand errors. From the expander (handle
0009)
>>> you see millions of errors on phys 12 to 15, that are connected to
the HBA.
>>> 
>>> Also interesting is that one of the phys, adapter phy 0, shows no
errors, but we see
>>> errors on the others. This is unusual because there are 4 links in
the cable.
>>> 
>>> Still smells like hardware to me.
>>> -- richard
>>> 
>> 
>> I''m not quite extrapolating this data like you are.  I see
handle 0009 which looks to be the expander.  Card #1 is hooked to phy 8-11 and
Card #2 is hooked to phy 12-15.  (port 0 and 1 on the expander)
>> 
>> As far as symmetrical errors, yeah the whole thing is screwy. The one
thing I am seeing as stand out that I did not notice before for some reason is
that "right card" (the one that normally handles phy 12-15) in my
previous output from my initial inquiry carries 1+M errors on the expander phys
regardless of the "right or left" cable.  Perhaps that is an indicator
of hardware malfunction. The "left" card (usually responsible for phy
8-11) throws something in the order of 600+K (under 1M) using "right or
left" cable (phy 8-11 or 12-15).  Those numbers are uncomfortably high too,
though.
> 
> Agree.
> 
>> Basically the output of my SAS Diag.txt was flipping between single use
of each card with each of the two cables I had available to me.  If I were to
show the output now with both cards enabled phy 8-15 on the expander all show
"link up" situation.
> 
> Are the cables of the same make/model? Unfortunately, it is not uncommon to
see bad cables :-(
> I had one just last week :-(
The cables are identical.  My VAR put this all together about 2 years ago.   I
don''t have any other cables to test but the present fix is
"upgrade to SAS3 (6GB) backplane/cards/cables".

>> The other mystery as you mentioned is why Adapter phy 0 is error free
while the other 3 phys are not. It''s also persistent across cables used
AND cards used.
> 
> A mystery?
> -- richard
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> LISA ''11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Richard Elling

2011-Dec-04 05:45 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 9:32 PM, Ryan Wehler wrote:> On Dec 3, 2011, at 11:18 PM, Richard Elling wrote:
> 
>> On Dec 3, 2011, at 9:02 PM, Ryan Wehler wrote:
>>> 
>>> On Dec 3, 2011, at 10:31 PM, Richard Elling wrote:
>>> 
>>>> On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
>>>> 
>>>>> Hi Richard,
>>>>> Thanks for getting back to me.
>>>>> 
>>>>> 
>>>>> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
>>>>> 
>>>>>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>>>>>> 
>>>>>>> During the diagnostics of my SAN failure last week
we thought we had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>>>>>> 
>>>>>> The link error counters are on the receiving side. To
see the complete picture, you need to look at
>>>>>> link errors on both ends of each link (more below?)
>>>>>> 
>>>>>>> 
>>>>>>> Even with nothing connected but the HBA to the
backplane expander, a simple restart of the SAN into a OpenIndiana LiveCD or
other distribution (NexentaStor) increments the counter.
>>>>>> 
>>>>>> A few counters can tick up when the system is reset at
boot. These can be ignored.
>>>>>> 
>>>>>> What you are looking for is  a consistent increase of
the  counters under load. In some cases
>>>>>> I have seen millions of errors per minute on a very
unhappy system.
>>>>> 
>>>>> But we''re talking about 600,000 -> 2,000,000
errors on a simple reset at boot.  Per my VAR their 6GB hardware show
significantly less (in the 10s to 100s of errors, not 100s to millions).
>>>> 
>>>> For high-quality hardware, I see 4 to 8.  If I see > 1,000,
then I start replacing hardware.
>>> 
>>> 
>>> And how do you define "high quality hardware"?  Obviously
these aren''t crummy SATA adapters and low cost drives.  The Chassis and
backplane are on Nexenta''s HSL.  While the cards are not, explicitly
listed. The underlying chip (LSI 1068) is on another card (3081E-R) that is on
the HSL.
>> 
>> I recently tested a HP DL380 G7 with D2600 and D2700 JBOD chassis. Zero
errors.
> 
> I''m assuming these had some sort of LSI cards in them since
that''s the primary focus here.  Do you happen to know models and what
expander chip was used on the backplane(s)?
LSI 2008 chipset (HP SC08Ge HBA).  Expanders are HP-branded, I''ll
speculate they are LSI SAS2x28.

Note: there is also firmware on the HBAs and expanders. But I do not expect
firmware to change the
link error counts. I suspect that is more of a physical issue.
>> Currently, the test process for HSL records any errors, but as long as
the root cause can be
>> explained, the devices can pass certification.
> 
> Well.... since we can''t even come to a reasonable justification on
why these errors exist with no "true" indicator of bad hardware,
something like this could pass the HSL if the VAR can justify it?  I''m
not saying thats what happened.. I''m just trying to understand the
process.
A certification does not mean that any specific implementation operates without
errors. A failed part,
noisy environment, or other influences will affect any specific implementation.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

Ryan Wehler

2011-Dec-04 16:50 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 3, 2011, at 11:45 PM, Richard Elling wrote:
> On Dec 3, 2011, at 9:32 PM, Ryan Wehler wrote:
>> On Dec 3, 2011, at 11:18 PM, Richard Elling wrote:
>> 
>>> On Dec 3, 2011, at 9:02 PM, Ryan Wehler wrote:
>>>> 
>>>> On Dec 3, 2011, at 10:31 PM, Richard Elling wrote:
>>>> 
>>>>> On Dec 3, 2011, at 7:36 PM, Ryan Wehler wrote:
>>>>> 
>>>>>> Hi Richard,
>>>>>> Thanks for getting back to me.
>>>>>> 
>>>>>> 
>>>>>> On Dec 3, 2011, at 9:03 PM, Richard Elling wrote:
>>>>>> 
>>>>>>> On Dec 1, 2011, at 5:08 PM, Ryan Wehler wrote:
>>>>>>> 
>>>>>>>> During the diagnostics of my SAN failure last
week we thought we had seen a backplane failure due to high error counts with
''lsiutil''.  However, even with a new backplane and ruling out
failed cards (MPXIO or singular) or bad cables I''m still seeing my
error count with LSIUTIL increment.  I''ve got no disks attached to the
array right now so I''ve also ruled those out.
>>>>>>> 
>>>>>>> The link error counters are on the receiving side.
To see the complete picture, you need to look at
>>>>>>> link errors on both ends of each link (more below?)
>>>>>>> 
>>>>>>>> 
>>>>>>>> Even with nothing connected but the HBA to the
backplane expander, a simple restart of the SAN into a OpenIndiana LiveCD or
other distribution (NexentaStor) increments the counter.
>>>>>>> 
>>>>>>> A few counters can tick up when the system is reset
at boot. These can be ignored.
>>>>>>> 
>>>>>>> What you are looking for is  a consistent increase
of the  counters under load. In some cases
>>>>>>> I have seen millions of errors per minute on a very
unhappy system.
>>>>>> 
>>>>>> But we''re talking about 600,000 ->
2,000,000 errors on a simple reset at boot.  Per my VAR their 6GB hardware show
significantly less (in the 10s to 100s of errors, not 100s to millions).
>>>>> 
>>>>> For high-quality hardware, I see 4 to 8.  If I see >
1,000, then I start replacing hardware.
>>>> 
>>>> 
>>>> And how do you define "high quality hardware"? 
Obviously these aren''t crummy SATA adapters and low cost drives.  The
Chassis and backplane are on Nexenta''s HSL.  While the cards are not,
explicitly listed. The underlying chip (LSI 1068) is on another card (3081E-R)
that is on the HSL.
>>> 
>>> I recently tested a HP DL380 G7 with D2600 and D2700 JBOD chassis.
Zero errors.
>> 
>> I''m assuming these had some sort of LSI cards in them since
that''s the primary focus here.  Do you happen to know models and what
expander chip was used on the backplane(s)?
> 
> LSI 2008 chipset (HP SC08Ge HBA).  Expanders are HP-branded, I''ll
speculate they are LSI SAS2x28.
> 
> Note: there is also firmware on the HBAs and expanders. But I do not expect
firmware to change the
> link error counts. I suspect that is more of a physical issue.
In an effort to solve this problem I did update my 3442E-R HBAs from a 2009
firmware to "Phase 21" which came out earlier this year from LSI.  The
replacement backplane I got from my VAR when they thought that was the issue
moved the backplane firmware from 7015 to 7017 per lsiutil''s output.  
You''re right it must be a physical issue but it just seems highly
unlikely that BOTH HBAs failed and BOTH SAS cables failed (we''ll take
the expander out of the equation since it was replaced)
> 
>>> Currently, the test process for HSL records any errors, but as long
as the root cause can be
>>> explained, the devices can pass certification.
>> 
>> Well.... since we can''t even come to a reasonable
justification on why these errors exist with no "true" indicator of
bad hardware, something like this could pass the HSL if the VAR can justify it?
I''m not saying thats what happened.. I''m just trying to
understand the process.
> 
> A certification does not mean that any specific implementation operates
without errors. A failed part,
> noisy environment, or other influences will affect any specific
implementation.
Would it not be more prudent to re-run the tests after a failure was fixed and
try to eliminate environmental variables?  If you were to look up the reason it
made it onto the HSL it should be "It just works!", not "it
works, but this is why we''re seeing errors". That leads to doubt
when there are caveats and trying to diagnose like/same hardware in the future.
> -- richard
> 
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> LISA ''11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111204/26826425/attachment.html>

Richard Elling

2011-Dec-04 20:23 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On Dec 4, 2011, at 8:50 AM, Ryan Wehler wrote:>> 
>> A certification does not mean that any specific implementation operates
without errors. A failed part,
>> noisy environment, or other influences will affect any specific
implementation.
> 
> Would it not be more prudent to re-run the tests after a failure was fixed
and try to eliminate environmental variables?  If you were to look up the reason
it made it onto the HSL it should be "It just works!", not "it
works, but this is why we''re seeing errors". That leads to doubt
when there are caveats and trying to diagnose like/same hardware in the future.
Perhaps I wasn''t clear. When we root cause an error reported during
certification it is to
absolve the device under test. For example, if we run a test against a disk and
see errors
on the wire caused by a backplane or cable, then we must absolve the disk of the
errors.
If the disk is the root cause of the error reports, then it fails certification.

Do not confuse certification with "it runs forever with no problems in all
cases"
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

James C. McPherson

2011-Dec-04 22:11 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

On  5/12/11 02:50 AM, Ryan Wehler wrote:
...> In an effort to solve this problem I did update my 3442E-R HBAs from a
> 2009 firmware to "Phase 21" which came out earlier this year from
LSI. The
> replacement backplane I got from my VAR when they thought that was the
> issue moved the backplane firmware from 7015 to 7017 per lsiutil''s
output.
> You''re right it must be a physical issue but it just seems highly
unlikely
> that BOTH HBAs failed and BOTH SAS cables failed (we''ll take the
expander
> out of the equation since it was replaced)

You need to look at the data available, rather than making
assumptions. When I was part of CPRE (now PTS?) in Sun we
referred to swapping hardware without investigation as
practicing "swaptronics". Every escalation we got where this
had happened took longer to resolve as a result.

So yes, it certainly could be a hardware problem twice in a
row. You''d want to examine the serial numbers and other identifying
data such as manufacturing date codes to see how likely that is.
In the past I''ve seen cases where replacement disks turned out to
be duds across several different batches and different factories
involved. The true root cause was traced to a chip that was supplied
to the manufacturer by a third party.

Personally, I''d start looking at the cables first - in my
experience they seem to incur more physical stress through the
connect/disconnect operations than HBAs.

James C. McPherson
--
Oracle
http://www.jmcp.homeunix.com/blog

Ryan Wehler

2011-Dec-05 01:15 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

Well if we want to get into theories on faulty hardware batches and such we can.
Though I think the likelihood is slim but not impossible I suppose.

I did the best I can diagnostic wise given I have no spare parts that have never
been a part of this SAN. As I said, I still think the likelihood of two failed
HBAs or failed cables just doesn''t add up.  The errors thrown between
cards is pretty consistent between cable swaps too, so nothing really indicative
of A bad cable, let alone two.

My vendor has more hardware on it''s way to me early this coming week..
so I''ll be able to report back once I have new HBAs and cables too.

On Dec 4, 2011, at 4:11 PM, James C. McPherson wrote:
> On  5/12/11 02:50 AM, Ryan Wehler wrote:
> ...
>> In an effort to solve this problem I did update my 3442E-R HBAs from a
>> 2009 firmware to "Phase 21" which came out earlier this year
from LSI. The
>> replacement backplane I got from my VAR when they thought that was the
>> issue moved the backplane firmware from 7015 to 7017 per
lsiutil''s output.
>> You''re right it must be a physical issue but it just seems
highly unlikely
>> that BOTH HBAs failed and BOTH SAS cables failed (we''ll take
the expander
>> out of the equation since it was replaced)
> 
> 
> You need to look at the data available, rather than making
> assumptions. When I was part of CPRE (now PTS?) in Sun we
> referred to swapping hardware without investigation as
> practicing "swaptronics". Every escalation we got where this
> had happened took longer to resolve as a result.
> 
> So yes, it certainly could be a hardware problem twice in a
> row. You''d want to examine the serial numbers and other
identifying
> data such as manufacturing date codes to see how likely that is.
> In the past I''ve seen cases where replacement disks turned out to
> be duds across several different batches and different factories
> involved. The true root cause was traced to a chip that was supplied
> to the manufacturer by a third party.
> 
> Personally, I''d start looking at the cables first - in my
> experience they seem to incur more physical stress through the
> connect/disconnect operations than HBAs.
> 
> 
> 
> James C. McPherson
> --
> Oracle
> http://www.jmcp.homeunix.com/blog

Ryan Wehler

2011-Dec-05 21:08 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

Here''s LSIUTIL after swapping to a 6GB backplane and dual 9211-8i cards
on
a fresh boot.

Much better. :)

Adapter Phy 0:  Link Up, No Errors

Adapter Phy 1:  Link Up, No Errors

Adapter Phy 2:  Link Up, No Errors

Adapter Phy 3:  Link Up, No Errors

Adapter Phy 4:  Link Down, No Errors

Adapter Phy 5:  Link Down, No Errors

Adapter Phy 6:  Link Down, No Errors

Adapter Phy 7:  Link Down, No Errors

Expander (Handle 0009) Phy 0:  Link Down, No Errors

Expander (Handle 0009) Phy 1:  Link Down, No Errors

Expander (Handle 0009) Phy 2:  Link Down, No Errors

Expander (Handle 0009) Phy 3:  Link Down, No Errors

Expander (Handle 0009) Phy 4:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 7
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 5:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 6
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 6:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 5
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 7:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 7
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 8:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 7
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 9:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 4
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 10:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 6
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 11:  Link Up
  Invalid DWord Count                                           8
  Running Disparity Error Count                                 6
  Loss of DWord Synch Count                                     2
  Phy Reset Problem Count                                       0

Expander (Handle 0009) Phy 12:  Link Down, No Errors

Expander (Handle 0009) Phy 13:  Link Down, No Errors

Expander (Handle 0009) Phy 14:  Link Down, No Errors

Expander (Handle 0009) Phy 15:  Link Down, No Errors

Expander (Handle 0009) Phy 16:  Link Down, No Errors

Expander (Handle 0009) Phy 17:  Link Down, No Errors

Expander (Handle 0009) Phy 18:  Link Up, No Errors

Expander (Handle 0009) Phy 19:  Link Up, No Errors

Expander (Handle 0009) Phy 20:  Link Down, No Errors

Expander (Handle 0009) Phy 21:  Link Down, No Errors

Expander (Handle 0009) Phy 22:  Link Down, No Errors

Expander (Handle 0009) Phy 23:  Link Down, No Errors

Expander (Handle 0009) Phy 24:  Link Down, No Errors

Expander (Handle 0009) Phy 25:  Link Down, No Errors

Expander (Handle 0009) Phy 26:  Link Down, No Errors

Expander (Handle 0009) Phy 27:  Link Down, No Errors

Expander (Handle 0009) Phy 28:  Link Up, No Errors

Expander (Handle 0009) Phy 29:  Link Up, No Errors

Expander (Handle 0009) Phy 30:  Link Up, No Errors

Expander (Handle 0009) Phy 31:  Link Up, No Errors

Expander (Handle 0009) Phy 32:  Link Up, No Errors

Expander (Handle 0009) Phy 33:  Link Up, No Errors

Expander (Handle 0009) Phy 34:  Link Down, No Errors

Expander (Handle 0009) Phy 35:  Link Down, No Errors

Expander (Handle 0009) Phy 36:  Link Up, No Errors

Expander (Handle 0009) Phy 37:  Link Down, No Errors
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111205/a4d67b02/attachment.html>

Ryan Wehler

2011-Dec-05 23:47 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

Whoops. Make that 9211-4i cards. :) Still promising.

Jim Klimov

2011-Dec-06 21:35 UTC

head link

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

2011-12-05 5:15, Ryan Wehler wrote:> Well if we want to get into theories on faulty hardware batches and such we
can. Though I think the likelihood is slim but not impossible I suppose.
>
> I did the best I can diagnostic wise given I have no spare parts that have
never been a part of this SAN. As I said, I still think the likelihood of two
failed HBAs or failed cables just doesn''t add up.  The errors thrown
between cards is pretty consistent between cable swaps too, so nothing really
indicative of A bad cable, let alone two.

Well, speculation-wise, if these were nearly-identical items serving
for the same time in identical conditions (same enclosure), they could
fail together just because they were subjected to the same shocks,
power surges, or perhaps more likely aging of components (i.e. drying
up of capacitors, oxydization of soldered connections, diffusion of
atoms in the microchips - whatever). Regarding soldered connections -
there was a true story some 10 years ago about Fujitsu desktop drives
dying at nearly the same age after exiting the factory (few months
old), which was tracked to some more-than-usual acidity of soldering
lead or its addons. Overall, the electrical links just stopped working
after a while due to oxydization into the bulk of the metal blobs :)

Still, congratulations on that replacement hardware did solve the
problem! ;)

//Jim

Maybe Matching Threads

Search for more apparently analagous threads

zfs discuss - Dec 2011 - LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

[zfs-discuss] LSI 3GB HBA SAS Errors (and other misc)

Maybe Matching Threads