On 2014-02-16 16:36, Elliot Dierksen wrote:> On 2/5/2014 8:24 AM, Charles Lepple wrote:
>> On Feb 4, 2014, at 10:48 PM, Elliot Dierksen wrote:
>>
>>> NUT will complain endlessly about communication errors and never
>>> establish SNMP communication with my APC UPS
>> Hmm, at first glance, I read the "complain endlessly" part as
a figure
>> of speech, and figured SNMP would get there eventually since it's
UDP.
>> But if you have to stop and restart NUT, that is a different story
> Feb 16 10:15:40 freenas upsmon[2939]: UPS [CR-UPS]: connect failed:
> Connection failure: Connection refused
> Feb 16 10:16:15 freenas last message repeated 7 times
> Feb 16 10:18:17 freenas last message repeated 24 times
> Feb 16 10:19:38 freenas last message repeated 16 times
I believe (recently had similar experience) what happens is as follows:
1) Your OS starts up and begins to start complicated networking which
needs some time to converge and actually work.
2) Your NUT starts up - upsdrv, upsd and upsmon.
3) The drivers have a timeout for startup (45 sec default IIRC),
and snmp-ups does not make it in time. So upsdrv fails, upsd has no
UPS data to publish, and upsmon has nothing to watch - though it
does try and "complains endlessly".
Ways out might be as follows:
1) Restart NUT as you do, or as I do optionally in the attached script
(which evaluates connectivity to the configured UPSes and if any are
missing - schedules to retry itself in a minute via the "at" utility).
Feel free to appropriate the script into the project, if deemed fit :)
2) Make infinite retries and delays until the driver finds the UPS.
Done in a "brute-force manner", this indefinitely delays your OS
startup
and might even be a deadlock (i.e. if your NUT is asked to start before
networking).
3) Infinite retries, but in the background as a driver daemon. This
makes sense, since if the driver was initially "connected" and then
lost the connection (i.e. networking gear or the UPS management card
were restarted), it does retry and ultimately finds the UPS again
without manual reloads of NUT.
I am not sure if option like (3) is available in 2.7.x, the attached
script was developed and tested for a Linux system with nut-2.6.3,
and the idea (and an earlier implementation) dates way back. This
version should probably work in Solaris as well (but not yet tested),
though I can't vouch for FreeNAS and other platforms.
HTH,
//Jim Klimov
-------------- next part --------------
#!/bin/sh
#
# chkconfig: 2345 55 89
# description: The UPS monitor and shutdown controller for delayed startup
retries
# Copyright (C) 2000-2014 by Jim Klimov
# $Id: ups-delayed,v 1.1 2014/02/18 17:52:49 jim Exp $
SELF="$0"
AT_DELAY="now +1 min"
### Guess the locations of needed programs and config files
### (among variants typical for various distribution layouts)
for F in {,/usr/local}/etc/init.d/upsdrv ; do
[ x"$UPSDRV" = x ] && [ -s "$F" -a -x
"$F" ] && UPSDRV="$F"
done
[ x"$UPSDRV" = x -o ! -x "$UPSDRV" ] && \
echo "Missing UPSDRV init-script" && exit 1
for F in {,/usr/local}/etc/init.d/upsd ; do
[ x"$UPSD" = x ] && [ -s "$F" -a -x
"$F" ] && UPSD="$F"
done
[ x"$UPSD" = x -o ! -x "$UPSD" ] && \
echo "Missing UPSD init-script" && exit 1
for F in {,/usr/local}/etc/init.d/upsmon ; do
[ x"$UPSMON" = x ] && [ -s "$F" -a -x
"$F" ] && UPSMON="$F"
done
[ x"$UPSMON" = x -o ! -x "$UPSMON" ] && \
echo "Missing UPSMON init-script" && exit 1
for F in /etc/{nut,ups}/ups.conf ; do
[ x"$UPSDRV_CONF" = x ] && [ -s "$F" ]
&& UPSDRV_CONF="$F"
done
[ x"$UPSDRV_CONF" = x -o ! -s "$UPSDRV_CONF" ] && \
echo "Missing UPSDRV_CONF" && exit 1
for F in /usr{,/local,/local/ups/}/bin/upsc ; do
[ x"$UPSC" = x ] && [ -s "$F" -a -x
"$F" ] && UPSC="$F"
done
[ x"$UPSC" = x -o ! -x "$UPSC" ] && \
echo "Missing UPSC client" && exit 1
for F in /var/lib/nut/var/lib/upsd /var/{state,lib}/{nut,ups}; do
[ x"$STATEDIR" = x ] && [ -d "$F" ] &&
STATEDIR="$F"
done
[ x"$STATEDIR" = x -o ! -d "$STATEDIR" ] && \
echo "Missing state directory for sockets and pidfiles" &&
exit 1
echo_wall() {
echo "`date`: $*" >&2
echo "`date`: $*" | wall
}
sched() {
echo_wall "Scheduling delayed startup of UPS monitoring (at
'$AT_DELAY')"
echo ${SELF} start-now | at $AT_DELAY
}
start() {
status > /dev/null 2>&1 || sched
}
status() {
RES=0
echo "=== Daemon states:"
${UPSDRV} status || RES=$?
${UPSD} status || RES=$?
${UPSMON} status || RES=$?
for UPS in \
`egrep '^\[' ${UPSDRV_CONF} | sed 's,^ *\[\(.*\)\]
*,\1,g'`; do
echo "=== Querying ${UPS}@localhost:"
${UPSC} ${UPS}@localhost || RES=$?
done
echo "=== Scheduled restarter job?"
atq 2>&1
if [ x"$VERBOSE" != x ]; then
echo ""
echo "=== Running processes and statedir ($STATEDIR) contents:"
ps -ef | grep -v grep | egrep '[ /]ups|nut'
ls -la ${STATEDIR}
date
echo ""
fi
echo "=== Overall result:"
if [ "$RES" = 0 ]; then
echo "Status: [--OK--]"
else
echo "Status: [-FAIL-]" >&2
fi
return $RES
}
cleanup() {
echo "=== Trying to clean-up the NUT state directory $STATEDIR from stale
files..."
${UPSMON} stop
${UPSD} stop
rm -f ${STATEDIR}/upsd.pid
${UPSDRV} stop
for UPS in \
`egrep '^\[' ${UPSDRV_CONF} | sed 's,^ *\[\(.*\)\]
*,\1,g'`; do
rm -f ${STATEDIR}/*-${UPS}{,.pid}
done
}
start_now() {
echo_wall "Trying to delayed-start the UPS monitoring now"
${UPSDRV} status || \
${UPSDRV} restart
sleep 5
${UPSDRV} status || { cleanup; sched; exit 1; }
${UPSD} restart
${UPSMON} restart
sleep 5
( status || { cleanup; sched; exit 1; } ) | wall
}
case "$1" in
start) start ;;
status) [ "$2" = "-v" ] && VERBOSE=1
status ;;
start-now) start_now ;;
stop) cleanup ;;
restart) cleanup; start_now ;;
esac