thr3ads.net - CentOS - [CentOS] Dell PERC H800 commandline RAID monitoring tools [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Dr. Ed Morbius

2011-Mar-07 20:43 UTC

[CentOS] Dell PERC H800 commandline RAID monitoring tools

We're looking for tools to be used in monitoring the PERC H800 arrays on
a set of database servers running CentOS 5.5.

We've installed most of the OMSA (Dell monitoring) suite.

Our current alerting is happening through SNMP, though it's a bit hit or
miss (we apparently missed a couple of earlier predictive failure alerts
on one drive).

OMSA conflicts with mega-cli, though we may find that the latter is the
more useful package.  Both are pretty byzantine, the Dell stuff simply
doesn't have docs (in particular: docs on how to interpret the omconfig
log output).

Ideally we'd like something which could be run as a Nagios plugin or
cron job providing information on RAID status and/or possible disk
errors.  Probably both, actually.

Thanks in advance.

-- 
Dr. Ed Morbius, Chief Scientist /            |
  Robot Wrangler / Staff Psychologist        | When you seek unlimited power
Krell Power Systems Unlimited                |                  Go to Krell!

Eero Volotinen

2011-Mar-07 20:57 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

2011/3/7 Dr. Ed Morbius <dredmorbius at
gmail.com>:> We're looking for tools to be used in monitoring the PERC H800 arrays
on
> a set of database servers running CentOS 5.5.
>
> We've installed most of the OMSA (Dell monitoring) suite.
>
> Our current alerting is happening through SNMP, though it's a bit hit
or
> miss (we apparently missed a couple of earlier predictive failure alerts
> on one drive).
>
> OMSA conflicts with mega-cli, though we may find that the latter is the
> more useful package. ?Both are pretty byzantine, the Dell stuff simply
> doesn't have docs (in particular: docs on how to interpret the omconfig
> log output).
>
> Ideally we'd like something which could be run as a Nagios plugin or
> cron job providing information on RAID status and/or possible disk
> errors. ?Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution:
http://folk.uio.no/trondham/software/check_openmanage.html

--
Eero

Blake Hudson

2011-Mar-07 22:04 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

-------- Original Message  --------
Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools
From: Dr. Ed Morbius <dredmorbius at gmail.com>
To: CentOS User list <centos at centos.org>
Date: Monday, March 07, 2011 2:43:03 PM> We're looking for tools to be used in monitoring the PERC H800 arrays
on
> a set of database servers running CentOS 5.5.If you purchased the server with an add-in DRAC, the DRAC can provide
email alerts if an array becomes degraded (or just about any other
hardware fault). This isn't necessarily a replacement for your current
monitoring, but it can be used to supplement or compliment it.

--Blake

Dr. Ed Morbius

2011-Mar-07 22:28 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorbius at gmail.com)
wrote:> We're looking for tools to be used in monitoring the PERC H800 arrays
on
> a set of database servers running CentOS 5.5.
Pardoning the self-reply, but one issue we've ahd is reconciling the
omcontrol log report with the Dell Server Manager syslog messages.

omcontrol reported a predictive drive failure, but we (and three Dell
storage/support techs) had trouble identifying which actual device was
being reporrted as bad.


From 'omconfig storage controller action=exportlog controller=0' output:

    03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42:  96=Predictive failure: PD
00(e0x08/s2)
    03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2)

In /var/log/messages (timestamp/hostname trimmed):

    Server Administrator: Storage Service EventID: 2243  The Patrol Read has
stopped.:  Controller 0 (PERC H800 Adapter)
    Server Administrator: Storage Service EventID: 2049  Physical disk removed: 
Physical Disk 0:0:2 Controller 0, Connector 0

The Server Administrator reports of a slot 2 failure correspond to the
drive which was physically replaced.

The OMSA omconfig report is throwing us a bunch of crud about some
device, but Dell variously identified it as slot 0 and slot 9.  We're
now getting from them that "/s2" identifies slot 2.


Dell said point blank "you're not going to have any luck with
that" as
far as documentation of the OMSA log report format and parsing being
documented.  Does anyone have a clue as to WTF it's actaully trying to
say, or what this tool is based off of (I'm suspecting mega-cli on a
general hunch but not much stronger).

"Enterprise support" .... indeed.

-- 
Dr. Ed Morbius, Chief Scientist /            |
  Robot Wrangler / Staff Psychologist        | When you seek unlimited power
Krell Power Systems Unlimited                |                  Go to Krell!

Ross Walker

2011-Mar-08 14:47 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

On Mar 7, 2011, at 3:43 PM, "Dr. Ed Morbius" <dredmorbius at
gmail.com> wrote:
> We're looking for tools to be used in monitoring the PERC H800 arrays
on
> a set of database servers running CentOS 5.5.
> 
> We've installed most of the OMSA (Dell monitoring) suite.
> 
> Our current alerting is happening through SNMP, though it's a bit hit
or
> miss (we apparently missed a couple of earlier predictive failure alerts
> on one drive).
> 
> OMSA conflicts with mega-cli, though we may find that the latter is the
> more useful package.  Both are pretty byzantine, the Dell stuff simply
> doesn't have docs (in particular: docs on how to interpret the omconfig
> log output).
> 
> Ideally we'd like something which could be run as a Nagios plugin or
> cron job providing information on RAID status and/or possible disk
> errors.  Probably both, actually.
I can't speak about nagios, but I have my OMSA setup to send traps, but for
critical errors to also send emails and it works well for us.

If you link the shared lib (forget the paths) and install megacli with --nodeps
you can have both installed.

-Ross

Dominik Zyla

2011-Mar-10 08:10 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

On Mon, Mar 07, 2011 at 12:43:03PM -0800, Dr. Ed Morbius
wrote:> OMSA conflicts with mega-cli, though we may find that the latter is the
> more useful package.  Both are pretty byzantine, the Dell stuff simply
> doesn't have docs (in particular: docs on how to interpret the omconfig
> log output).
We're using megacli wrapped by perl to provide information about Perc
events. It works quite well as far.

-- 
Dominik Zyla

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL:
<http://lists.centos.org/pipermail/centos/attachments/20110310/9d82f672/attachment-0001.sig>

Kai Schaetzl

2011-Mar-10 17:47 UTC

head link

[CentOS] Dell PERC H800 commandline RAID monitoring tools

Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:
> We're using megacli wrapped by perl to provide information about Perc
> events. It works quite well as far.
Do you have a megacli rpm that works with the CentOS-provided drivers, 
which is MPT 3.something? I googled about this some time ago and there's 
an rpm mentioned here and there that contains only the megacli utility, 
but it's not downloadable anymore from anywhere. I got hold of a package 
that cotnains the 4 version, but that doesn't work with the CentOS 
drivers. LSI themselves provide only the complete MegaRAID driver/package 
for download and it's not clear if the singe megacli utility is included 
or if installing it may overwrite the built-in driver.

Kai

Apparently Analagous Threads

Search for more apparently analagous threads

CentOS - Mar 2011 - Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

[CentOS] Dell PERC H800 commandline RAID monitoring tools

Apparently Analagous Threads