Dr. Ed Morbius
2011-Mar-07 20:43 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5. We've installed most of the OMSA (Dell monitoring) suite. Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive). OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output). Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually. Thanks in advance. -- Dr. Ed Morbius, Chief Scientist / | Robot Wrangler / Staff Psychologist | When you seek unlimited power Krell Power Systems Unlimited | Go to Krell!
Eero Volotinen
2011-Mar-07 20:57 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
2011/3/7 Dr. Ed Morbius <dredmorbius at gmail.com>:> We're looking for tools to be used in monitoring the PERC H800 arrays on > a set of database servers running CentOS 5.5. > > We've installed most of the OMSA (Dell monitoring) suite. > > Our current alerting is happening through SNMP, though it's a bit hit or > miss (we apparently missed a couple of earlier predictive failure alerts > on one drive). > > OMSA conflicts with mega-cli, though we may find that the latter is the > more useful package. ?Both are pretty byzantine, the Dell stuff simply > doesn't have docs (in particular: docs on how to interpret the omconfig > log output). > > Ideally we'd like something which could be run as a Nagios plugin or > cron job providing information on RAID status and/or possible disk > errors. ?Probably both, actually.if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html -- Eero
Blake Hudson
2011-Mar-07 22:04 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
-------- Original Message -------- Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools From: Dr. Ed Morbius <dredmorbius at gmail.com> To: CentOS User list <centos at centos.org> Date: Monday, March 07, 2011 2:43:03 PM> We're looking for tools to be used in monitoring the PERC H800 arrays on > a set of database servers running CentOS 5.5.If you purchased the server with an add-in DRAC, the DRAC can provide email alerts if an array becomes degraded (or just about any other hardware fault). This isn't necessarily a replacement for your current monitoring, but it can be used to supplement or compliment it. --Blake
Dr. Ed Morbius
2011-Mar-07 22:28 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorbius at gmail.com) wrote:> We're looking for tools to be used in monitoring the PERC H800 arrays on > a set of database servers running CentOS 5.5.Pardoning the self-reply, but one issue we've ahd is reconciling the omcontrol log report with the Dell Server Manager syslog messages. omcontrol reported a predictive drive failure, but we (and three Dell storage/support techs) had trouble identifying which actual device was being reporrted as bad. From 'omconfig storage controller action=exportlog controller=0' output: 03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42: 96=Predictive failure: PD 00(e0x08/s2) 03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2) In /var/log/messages (timestamp/hostname trimmed): Server Administrator: Storage Service EventID: 2243 The Patrol Read has stopped.: Controller 0 (PERC H800 Adapter) Server Administrator: Storage Service EventID: 2049 Physical disk removed: Physical Disk 0:0:2 Controller 0, Connector 0 The Server Administrator reports of a slot 2 failure correspond to the drive which was physically replaced. The OMSA omconfig report is throwing us a bunch of crud about some device, but Dell variously identified it as slot 0 and slot 9. We're now getting from them that "/s2" identifies slot 2. Dell said point blank "you're not going to have any luck with that" as far as documentation of the OMSA log report format and parsing being documented. Does anyone have a clue as to WTF it's actaully trying to say, or what this tool is based off of (I'm suspecting mega-cli on a general hunch but not much stronger). "Enterprise support" .... indeed. -- Dr. Ed Morbius, Chief Scientist / | Robot Wrangler / Staff Psychologist | When you seek unlimited power Krell Power Systems Unlimited | Go to Krell!
Ross Walker
2011-Mar-08 14:47 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
On Mar 7, 2011, at 3:43 PM, "Dr. Ed Morbius" <dredmorbius at gmail.com> wrote:> We're looking for tools to be used in monitoring the PERC H800 arrays on > a set of database servers running CentOS 5.5. > > We've installed most of the OMSA (Dell monitoring) suite. > > Our current alerting is happening through SNMP, though it's a bit hit or > miss (we apparently missed a couple of earlier predictive failure alerts > on one drive). > > OMSA conflicts with mega-cli, though we may find that the latter is the > more useful package. Both are pretty byzantine, the Dell stuff simply > doesn't have docs (in particular: docs on how to interpret the omconfig > log output). > > Ideally we'd like something which could be run as a Nagios plugin or > cron job providing information on RAID status and/or possible disk > errors. Probably both, actually.I can't speak about nagios, but I have my OMSA setup to send traps, but for critical errors to also send emails and it works well for us. If you link the shared lib (forget the paths) and install megacli with --nodeps you can have both installed. -Ross
Dominik Zyla
2011-Mar-10 08:10 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
On Mon, Mar 07, 2011 at 12:43:03PM -0800, Dr. Ed Morbius wrote:> OMSA conflicts with mega-cli, though we may find that the latter is the > more useful package. Both are pretty byzantine, the Dell stuff simply > doesn't have docs (in particular: docs on how to interpret the omconfig > log output).We're using megacli wrapped by perl to provide information about Perc events. It works quite well as far. -- Dominik Zyla -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20110310/9d82f672/attachment-0001.sig>
Kai Schaetzl
2011-Mar-10 17:47 UTC
[CentOS] Dell PERC H800 commandline RAID monitoring tools
Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:> We're using megacli wrapped by perl to provide information about Perc > events. It works quite well as far.Do you have a megacli rpm that works with the CentOS-provided drivers, which is MPT 3.something? I googled about this some time ago and there's an rpm mentioned here and there that contains only the megacli utility, but it's not downloadable anymore from anywhere. I got hold of a package that cotnains the 4 version, but that doesn't work with the CentOS drivers. LSI themselves provide only the complete MegaRAID driver/package for download and it's not clear if the singe megacli utility is included or if installing it may overwrite the built-in driver. Kai
Apparently Analagous Threads
- Remote-logging nginx? (or other non-syslog-enabled stuff)
- CentOS and Dell MD3200i / MD3220i iSCSI w/ multipath -- slightly OT
- low end file server with h/w RAID - recommendations
- Run commands automatically when bringing up/down network interfaces?
- Performance issues with iSCSI under Linux