Stuart Gathman
2017-Apr-01 19:41 UTC
[Nut-upsuser] how do you test (nagios) that upsmon is connected?
On 04/01/2017 03:14 PM, Dan Craciun wrote:> On my Nagios monitoring system I use check_nut_plus (that in turn > calls upsc) to monitor the status (ups.status), load (ups.load), > battery charge (battery.charge) and runtime (battery.runtime). > > If these return "unknown", it means upsd is no longer monitoring the > UPS. As long as you get data, upsd is working. > > PS: as an example, this is my check for the status: > /usr/bin/perl -w $USER$/check_nut_plus -d $ARG1$@$HOSTADDRESS$ -v > 'ups.status=c!~^OL'That's great, but Spike wants to know whether *upsmon* is working. He already has a way to check that upsd is working. I don't have a complete solution, but I use NOTIFYCMD in upsmon.conf to run upssched. As part of upssched.conf, I append NOCOMM (and COMMOK) events to a log file. If NOCOMM in ups.log is not followed by COMMOK, then upsmon will not shut down the system. NOPARENT should probably be logged also, as that makes upsmon unable to shutdown the system. I agree that this "no news is good news" policy is not ideal - but I've found it much more effective that no monitoring. Note this also - if upsmon can't check UPS status, then nagios almost certainly can't either. To test, set up upsmon on a remote machine, and block 3493/tcp (nut) in the firewall on the machine running upsd. Nagios should scream.
Roger Price
2017-Apr-01 20:54 UTC
[Nut-upsuser] how do you test (nagios) that upsmon is connected?
On Sat, 1 Apr 2017, Stuart Gathman wrote:> On 04/01/2017 03:14 PM, Dan Craciun wrote: >> On my Nagios monitoring system I use check_nut_plus (that in turn >> calls upsc) to monitor the status (ups.status), load (ups.load), >> battery charge (battery.charge) and runtime (battery.runtime). >> >> If these return "unknown", it means upsd is no longer monitoring the >> UPS. As long as you get data, upsd is working. >> > That's great, but Spike wants to know whether *upsmon* is working. He > already has a way to check that upsd is working.How about using a dummy ups to set up a regular end-to-end heart beat. As long as the heart beats, there is no news, but if it stops, upssched-cmd sends out an e-mail or other warning. In ups.conf, add [heartbeat] driver = dummy-ups port = heartbeat.dev desc = "Dummy ups sends heart beat to upssched-cmd" In heartbeat.dev, write ups.status: REPLBATT TIMER 300 In upsmon.conf, write NOTIFYFLAG REPLBATT SYSLOG+EXEC In upssched.conf, add # Heatbeat from dummy ups every 5 minutes, re-start 6 minute timer AT REPLBATT heartbeat CANCEL-TIMER heatbeat-timer AT REPLBATT heartbeat START-TIMER heatbeat-timer 360 In upssched-cmd, if heatbeat-timer completes, then send "UPS heatbeat failure" message to sysadmin. If this works, let me know, and I will use it myself :-) It would be nice to have a HEARTBEAT status instead of using REPLBATT. Roger
Spike
2017-Apr-03 17:28 UTC
[Nut-upsuser] how do you test (nagios) that upsmon is connected?
thank you all for your input. Roger, I'm a nut noob and only marginally understand the implementation (from your other email), but I really like the idea of a heartbeat and design wise it makes a lot of sense. I'll see if I can implement it some time soon. thank you, Spike On Sat, Apr 1, 2017 at 1:54 PM Roger Price <roger at rogerprice.org> wrote:> On Sat, 1 Apr 2017, Stuart Gathman wrote: > > > On 04/01/2017 03:14 PM, Dan Craciun wrote: > >> On my Nagios monitoring system I use check_nut_plus (that in turn > >> calls upsc) to monitor the status (ups.status), load (ups.load), > >> battery charge (battery.charge) and runtime (battery.runtime). > >> > >> If these return "unknown", it means upsd is no longer monitoring the > >> UPS. As long as you get data, upsd is working. > >> > > That's great, but Spike wants to know whether *upsmon* is working. He > > already has a way to check that upsd is working. > > How about using a dummy ups to set up a regular end-to-end heart beat. > As long as the heart beats, there is no news, but if it stops, > upssched-cmd sends out an e-mail or other warning. > > In ups.conf, add > > [heartbeat] > driver = dummy-ups > port = heartbeat.dev > desc = "Dummy ups sends heart beat to upssched-cmd" > > In heartbeat.dev, write > > ups.status: REPLBATT > TIMER 300 > > In upsmon.conf, write > > NOTIFYFLAG REPLBATT SYSLOG+EXEC > > In upssched.conf, add > > # Heatbeat from dummy ups every 5 minutes, re-start 6 minute timer > AT REPLBATT heartbeat CANCEL-TIMER heatbeat-timer > AT REPLBATT heartbeat START-TIMER heatbeat-timer 360 > > In upssched-cmd, if heatbeat-timer completes, then send "UPS heatbeat > failure" message to sysadmin. > > If this works, let me know, and I will use it myself :-) > It would be nice to have a HEARTBEAT status instead of using REPLBATT. > > Roger > > _______________________________________________ > Nut-upsuser mailing list > Nut-upsuser at lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20170403/5bfececa/attachment.html>
Apparently Analagous Threads
- how do you test (nagios) that upsmon is connected?
- how do you test (nagios) that upsmon is connected?
- how do you test (nagios) that upsmon is connected?
- Fopen upsmon.pid - no such file or directory - Nut 2.8.0 built from source
- how do you test (nagios) that upsmon is connected?