thr3ads.net - CentOS - [CentOS] OT, hardware: HP smart array drive issue [Jul 2015]

If this information is useful, please help other people find it:
Share via:

m.roth at 5-cent.us

2015-Jul-10 16:47 UTC

[CentOS] OT, hardware: HP smart array drive issue

Hi. Anyone working with these things? I've got a drive in "predictive
failure" on in a RAID5. Now here's the thing: there was an issue
yesterday
when I got in, and I wound up power cycling the RAID; first boot of
attached server had issues, and said the controller had a failure, and a
drive had failed, and wouldn't continue booting; when I gave it the
three-finger salute, this time on the way up, during POST, it noted the
controller issue... but the thing came up, looking like it did a couple of
days ago.

Trying to prevent this from happening again, I've decided to replace the
drive that's in predictive failure. The array has a hot spare. I tried to
remove, using hpacucli, it refuses "operation not permitted", and
there
doesn't *seem* to be a "mark as failed" command. *Do* I just yank
the
drive?

     mark

Nathan Duehr

2015-Jul-14 18:32 UTC

head link

[CentOS] OT, hardware: HP smart array drive issue

On Jul 10, 2015, at 10:47, m.roth at 5-cent.us wrote:> 
> Trying to prevent this from happening again, I've decided to replace
the
> drive that's in predictive failure. The array has a hot spare. I tried
to
> remove, using hpacucli, it refuses "operation not permitted", and
there
> doesn't *seem* to be a "mark as failed" command. *Do* I just
yank the
> drive?
Hi Mark, 

I?ve never had any problem just pulling and replacing drives on HP hardware with
the hardware RAID controllers (even the icky cheap one that came out around the
DL360/380 Gen 8 timeframe, that isn?t really hardware RAID and needs closed
drivers in Linux).

That said, I also *test it*, long before putting anything important on them? 

From past experience with HP stuff, it usually won?t move the data over to the
hot spare (especially if it?s a ?Global? hot spare and not specific to that
array) until an actual failure occurs.  ?Predictive failure? isn?t considered a
failure in HP?s world.  I don?t think there is any setting to tell the
controller to move to the hot spare if there?s a ?predictive failure?.

I?ve also had disks that triggered a ?predictive failure? under heavy load that
were simply popped out and back in, and the controller rebuilt them, and the
drive never did it again for *years*.  The ?predictive failure? error rate is
pretty low.

That last one is more a question of policy than anything.  How much do you trust
it?  At one employer the game was to pop out and back in any drive that showed
?predictive failure? on HP systems (Dell stuff we handled differently at the
time, it was less prone to false alarms, so to speak) and if they did it again
?soonish?, we?d call for the replacement disk.  That?s how often the HP
controllers did it.  In a rather large farm of HP stuff, I popped and replaced
an HP drive a week, whenever I happened by the data center.

As for the question of whether you should be able to do it safely or not? if a
hardware RAID controller won?t let me yank a physical drive out and shove
another one in and rebuild itself back to whatever level of redundancy was
defined by me as ?nominal? for that system, I don?t want it anyway.  Look at it
this way? if the disk had a catastrophic electronics failure while installed in
the array, the array should handle it? yanking it out is technically nicer than
some of the failure modes that can affect the busses on the backplane with
shorted electronics. (GRIN)

Just sharing my thoughts? your call. :-)  YMMV.  We had a service contract at
that place and a new disk was always just a phone call away and no additional $,
and even with that level of service, we always did the ?re-seat it once? thing. 
We?d log it and if anyone else saw that same disk flashing the next time they
were at the data center (we just looked at the logged ones before doing the
?re-seat?), they?d make the phone call and the service company would drop a
drive off a few hours later.

--
Nate Duehr
denverpilot at me.com

Reasonably Related Threads

Search for more reasonably related threads

CentOS - Jul 2015 - OT, hardware: HP smart array drive issue

[CentOS] OT, hardware: HP smart array drive issue

[CentOS] OT, hardware: HP smart array drive issue

Reasonably Related Threads