thr3ads.net - Nut upsuser - [Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Bob Brown

2015-Aug-14 12:53 UTC

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

I see this in syslog:
Aug 12 03:57:22 orion upsd[18274]: Data for UPS [a102-apc-10] is stale - check
driver

(one for each UPS).

Then I see a bunch of messages like this:

Aug 12 03:58:02 orion upsd[18274]: write() failed for ::1: Broken pipe
Aug 12 03:58:02 orion snmpd[2872]: Connection from UDP: [127.0.0.1]:50940

And then it returns to normal like this:

Aug 12 03:58:04 orion upsd[18274]: UPS [a102-apc-10] data is no longer stale

Where else can I look to find out what really happened?

-Bob

-----Original Message-----
From: Charles Lepple [mailto:clepple at gmail.com] 
Sent: Thursday, August 13, 2015 9:52 PM
To: Bob Brown
Cc: nut-upsuser at lists.alioth.debian.org
Subject: Re: [Nut-upsuser] occassional problem wtih upslog and apc ups units via
snmp ... [NA] status

On Aug 13, 2015, at 12:45 PM, Bob Brown <bbrown at harpercollege.edu>
wrote:> 
> Os is redhat enterprise, kernel 2.6.18-164.2.1.e15 NUT is apparently 
> 2.4.1 It was installed by tar.
>  
> Monitoring a lot of apc UPS units?mostly Smart-UPS of various flavors, also
a Silcon DP380E.
>  
> Normally works just fine?  ?upslog? writes status log every 30 seconds with
various parameters..
>  
> Every so often, one of the UPS units has a log entry of:
>  
> 20150812+035542+NA+NA+NA+[NA]+NA+NA+NA
>  
> When it normally looks like:
>  
> 20150813+114309+100.00+115.00+39.00+[OL]+25.00+60.00+1020.00
>  
> What does it mean when it  puts NA in the log file?
Every log interval, each variable is fetched from the server, and if there is an
error retrieving that value at that time, "NA" is logged:

https://github.com/networkupstools/nut/blob/master/clients/upslog.c#L203

Since they are all "NA" at the same time, it is likely that there is a
communication problem, either between upslog and upsd, or between upsd and the
driver. In either case, there is probably a syslog message describing the
problem in more detail.

--
Charles Lepple
clepple at gmail

Charles Lepple

2015-Aug-14 13:27 UTC

head link

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

On Aug 14, 2015, at 8:53 AM, Bob Brown <bbrown at harpercollege.edu>
wrote:> 
> I see this in syslog:
> Aug 12 03:57:22 orion upsd[18274]: Data for UPS [a102-apc-10] is stale -
check driver
I don't have a SNMP UPS to test with, and this might be different for NUT
2.4.1, but the current defaults seem to make stale data more likely in your
case.

In http://www.networkupstools.org/docs/man/snmp-ups.html#_extra_arguments the
default for "pollfreq" is 30 seconds, and in
http://www.networkupstools.org/docs/man/upsd.conf.html the "MAXAGE"
default is 15 seconds. So you might want to either increase MAXAGE, or decrease
pollfreq.

I would recommend making MAXAGE at least pollfreq (possibly 1.5x larger),
because decreasing pollfreq might run up against the amount of time that it
takes to poll all of the SNMP variables from the UPS. There are some retries and
timeouts in snmp-ups which make this difficult to guess without seeing the debug
logs. You might check the list archives to see if anyone has suggestions on what
pollfreq and MAXAGE values worked for them.
> (one for each UPS).
> 
> Then I see a bunch of messages like this:
> 
> Aug 12 03:58:02 orion upsd[18274]: write() failed for ::1: Broken pipe
This just means that the client (upslog, probably) disconnected before upsd
expected the session to end. The upsd-to-driver connection is a Unix-domain
socket (and the driver-to-UPS connection is UDP).
> Aug 12 03:58:02 orion snmpd[2872]: Connection from UDP: [127.0.0.1]:50940
> 
Might be unrelated, unless I misunderstood your NUT setup (upslog -> upsd
-> snmp-ups -> UPS). The snmpd daemon and the UPS comms card are SNMP
servers ("agents" in SNMP parlance, IIRC), and snmp-ups is a SNMP
client.

-- 
Charles Lepple
clepple at gmail

Charles Lepple

2015-Aug-14 22:50 UTC

head link

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

[please keep the list CC'd, thanks]
> On Aug 14, 2015, at 10:27 AM, Bob Brown <bbrown at harpercollege.edu>
wrote:
> 
> Would it put NA in the log files if maxage was too short?
I think so. I haven't tried it, but if snmp-ups polls every 30 seconds, and
MAXAGE is 15, then with a random polling interval, 50% of the time the data
would be more than 15 seconds old. In practice, the driver and upsd were
probably started around the same time, and due to differences in timing between
upsd and the driver, they occasionally drift out of sync. (Increasing MAXAGE is
probably the simplest way around that.)

The usbhid-ups USB driver (which tends to get a lot more testing these days)
splits polling into high and low priority updates, so that important variables
get update frequently (many times in one MAXAGE period), and other variables get
updated later. But successful high-priority updates clear out the "data
stale" flag. There are comments in the snmp-ups driver that recommend
implementing that there as well, but nobody has written that code yet.
>  Does it sound like a network delay and that's why it put NA and marked
the data as stale?
Packet loss between the driver and the SNMP card in the UPS is more likely than
delay (the MAXAGE and pollfreq arguments are in seconds), but as I mentioned,
the current version of NUT retries a few times, so ultimately, packet loss
shouldn't be a problem.
> -Bob
> 
> -----Original Message-----
> From: Charles Lepple [mailto:clepple at gmail.com] 
> Sent: Friday, August 14, 2015 8:28 AM
> To: Bob Brown
> Cc: nut-upsuser at lists.alioth.debian.org
> Subject: Re: [Nut-upsuser] occassional problem wtih upslog and apc ups
units via snmp ... [NA] status
> 
> On Aug 14, 2015, at 8:53 AM, Bob Brown <bbrown at harpercollege.edu>
wrote:
>> 
>> I see this in syslog:
>> Aug 12 03:57:22 orion upsd[18274]: Data for UPS [a102-apc-10] is stale
- check driver
> 
> I don't have a SNMP UPS to test with, and this might be different for
NUT 2.4.1, but the current defaults seem to make stale data more likely in your
case.
> 
> In http://www.networkupstools.org/docs/man/snmp-ups.html#_extra_arguments
the default for "pollfreq" is 30 seconds, and in
http://www.networkupstools.org/docs/man/upsd.conf.html the "MAXAGE"
default is 15 seconds. So you might want to either increase MAXAGE, or decrease
pollfreq.
> 
> I would recommend making MAXAGE at least pollfreq (possibly 1.5x larger),
because decreasing pollfreq might run up against the amount of time that it
takes to poll all of the SNMP variables from the UPS. There are some retries and
timeouts in snmp-ups which make this difficult to guess without seeing the debug
logs. You might check the list archives to see if anyone has suggestions on what
pollfreq and MAXAGE values worked for them.
> 
>> (one for each UPS).
>> 
>> Then I see a bunch of messages like this:
>> 
>> Aug 12 03:58:02 orion upsd[18274]: write() failed for ::1: Broken pipe
> 
> This just means that the client (upslog, probably) disconnected before upsd
expected the session to end. The upsd-to-driver connection is a Unix-domain
socket (and the driver-to-UPS connection is UDP).
> 
>> Aug 12 03:58:02 orion snmpd[2872]: Connection from UDP:
[127.0.0.1]:50940
>> 
> 
> Might be unrelated, unless I misunderstood your NUT setup (upslog ->
upsd -> snmp-ups -> UPS). The snmpd daemon and the UPS comms card are SNMP
servers ("agents" in SNMP parlance, IIRC), and snmp-ups is a SNMP
client.
> 
> -- 
> Charles Lepple
> clepple at gmail
> 
> 
> 
-- 
Charles Lepple
clepple at gmail

Reasonably Related Threads

Search for more seemingly similar threads

Nut upsuser - Aug 2015 - occassional problem wtih upslog and apc ups units via snmp ... [NA] status

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

Reasonably Related Threads