George Pontis
2006-Jan-04 19:14 UTC
[Nut-upsuser] Nut 2.0.1, net-snmp-5.1.2p2, APC SUA1000 with AP9606: repeated comm lost/established messages
Hello list members, I am running NUT 2.0.1 on an OpenBSD 3.8 system, using the OpenBSD port for net-snmp-5.1.2p2. The UPS is connected via an AP9606 Web/SNMP card with current firmware, and the LAN. All goes well for a long time, usually more than one month. Then the system starts spitting messages about losing communications with the UPS. A second message about reconnecting follows immediately. The messages are sometimes coming at a fast rate, faster than the 300 seconds specified for NOCOMMWARNTIME in upsmon.conf. A reboot of the computer running NUT (the only computer running NUT) does not fix it, so I assume that there is something going on with the UPS or the UPS network card. If this is not a known problem, I would be happy to do some guided sleuthing to track it down. However, I have no clue where to look. Any pointers would be much appreciated. George The system log messages look like this: Jan 4 10:57:00 z9 upsd[12613]: Data for UPS [ap9606-srv] is stale - check driver Jan 4 10:57:02 z9 upsd[12613]: UPS [ap9606-srv] data is no longer stale Jan 4 10:59:02 z9 upsd[12613]: Data for UPS [ap9606-srv] is stale - check driver Jan 4 10:59:02 z9 upsd[12613]: UPS [ap9606-srv] data is no longer stale The output from upsmon -D (with console messages interspersed) follows /usr/local/ups/sbin >> ./upsmon -D Network UPS Tools upsmon 2.0.1 UPS: ap9606-srv@localhost (slave) (power value 1) Using power down flag file /etc/killpower Trying to connect to UPS [ap9606-srv@localhost] Logged into UPS ap9606-srv@localhost polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (first time) polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status Poll UPS [ap9606-srv@localhost] failed - Data stale do_notify: ntype 0x0005 (COMMBAD) Communications with UPS ap9606-srv@localhost lost Broadcast Message from root@z9.z9.com (/dev/ttyp0) at 10:39 ... Communications with UPS ap9606-srv@localhost lost polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] do_notify: ntype 0x0004 (COMMOK) Communications with UPS ap9606-srv@localhost established parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) Broadcast Message from root@z9.z9.com (/dev/ttyp0) at 10:40 ... Communications with UPS ap9606-srv@localhost established polling ups: ap9606-srv@localhost get_var: ap9606-srv@localhost / status status: [OL] parsing: [OL]: ups_on_line(ap9606-srv@localhost) (no change) ^CSignal 2: exiting -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20060104/c3148e01/attachment.htm
Arnaud Quette
2006-Jan-05 11:59 UTC
[Nut-upsuser] Nut 2.0.1, net-snmp-5.1.2p2, APC SUA1000 with AP9606: repeated comm lost/established messages
Hi George 2006/1/4, George Pontis <gpontis@spamcop.net>:> > Hello list members, > > I am running NUT 2.0.1 on an OpenBSD 3.8 system, using the OpenBSD port > for net-snmp-5.1.2p2. The UPS is connected via an AP9606 Web/SNMP card > with current firmware, and the LAN. All goes well for a long time, usually > more than one month. Then the system starts spitting messages about losing > communications with the UPS. A second message about reconnecting follows > immediately. The messages are sometimes coming at a fast rate, faster than > the 300 seconds specified for NOCOMMWARNTIME in upsmon.conf. A reboot of > the computer running NUT (the only computer running NUT) does not fix it, so > I assume that there is something going on with the UPS or the UPS network > card. > > If this is not a known problem, I would be happy to do some guided > sleuthing to track it down. However, I have no clue where to look. > > Any pointers would be much appreciated. >does resetting the card itself (there should be something on the web admin interface, if there is one) resolve this issue. If so, then you've validated that the problem is from the comm card itself. Otherwise, it should be a problem in the ups core... Another test, on the nut side, would be to stop snmp-ups, then restart it in debug mode (using -DDDDD) when the comm is staling, and look at what happen. A simpler one would be, in the same context, to try a ping and an snmp-walk to see if the comm card is denying answer. Arnaud -- Linux / Unix Expert - MGE UPS SYSTEMS - R&D Dpt Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/ Debian Developer - http://people.debian.org/~aquette/ OpenSource Developer - http://arnaud.quette.free.fr/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20060105/bf1c7991/attachment.html