thr3ads.net - Nut upsuser - [Nut-upsuser] Plea for a more loquacious nut [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Roger Price

2013-Dec-04 20:35 UTC

[Nut-upsuser] Plea for a more loquacious nut

I would like nut to become more loquacious, and to log a much more 
complete report of its activity.  At present nut reports that its 
components have started operation but does not automatically log their 
activity when UPS's switch between OB and OL.  I believe that this 
under-reporting of important facts is too minimalist - it would be better 
for system administrators and for the nut support team if a much more 
complete report were available of all OB/OL activity by each component.

Looking at the source code, it seems that much of what is needed is 
already in place, but behind "if" conditions that ensure that little
or
nothing gets through.  Long ago I wrote software, including a compiler, 
but my C programming is limited to a class exercise many many years ago, 
and its based on this "experience" that I'm guessing that in
upssched.c
function exec_cmd the code

   snprintf(buf, sizeof(buf), "%s %s", cmdscript, cmd);
   err = system(buf);
   if (WIFEXITED(err)) {
 	if (WEXITSTATUS(err)) {
 		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
WEXITSTATUS(err));
 	}

attempts to send a command to the operating system, possibly to execute a 
Bash script.  If system(buf) fails, the tests block the error message. 
Surely the error message is essential.  An unattended box is now in an 
emergency situation.  After the inevitable IT failure the system should be 
auditable to discover what went wrong and what should be done to prevent 
it happening in the future.  Such an audit expects to find "exec_cmd(%s) 
returned %d" in the log.

"But these problems should be found by testing!", one might argue. 
Firstly, the testing would be facilitated by this error message, and 
secondly, no amount of testing will ever cover every situation met in the 
real world.

I believe nut would be improved by

1. Logging a summary of the state of the nut system and the UPS's every 24 
hours.

2. Automatically logging a record of driver, upsd, upsmon and upssched 
activity for each OB/OL change.

3. Replacing the upsmon NOTIFYFLAG "SYSLOG" by "NOSYSLOG". 
All
notifications are logged unless the sysadmin explicitly calls for no 
logging.

Roger

Kris Jordan

2013-Dec-04 20:52 UTC

head link

[Nut-upsuser] Plea for a more loquacious nut

Roger Price wrote, On 12/4/2013 12:35 PM:> I would like nut to become more loquacious, and to log a much more 
> complete report of its activity.  At present nut reports that its 
> components have started operation but does not automatically log their 
> activity when UPS's switch between OB and OL.  I believe that this 
> under-reporting of important facts is too minimalist - it would be 
> better for system administrators and for the nut support team if a 
> much more complete report were available of all OB/OL activity by each 
> component.
I've noticed that NUT doesn't notice short power outages or glitches 
like previous setups (APC Smart-UPS+Apcupsd) I've had. I do miss that. I 
thought NUT supported UPS notifications/interrupts so that it can know 
of power events that happen in-between the regular poll interval. But I 
don't how much the lack of reporting is due to NUT or the Eaton UPS's I 
am using now.

Charles Lepple

2013-Dec-05 00:11 UTC

head link

[Nut-upsuser] Plea for a more loquacious nut

On Dec 4, 2013, at 3:52 PM, Kris Jordan wrote:
> Roger Price wrote, On 12/4/2013 12:35 PM:
>> I would like nut to become more loquacious, and to log a much more
complete report of its activity.  At present nut reports that its components
have started operation but does not automatically log their activity when
UPS's switch between OB and OL.  I believe that this under-reporting of
important facts is too minimalist - it would be better for system administrators
and for the nut support team if a much more complete report were available of
all OB/OL activity by each component.
> 
> I've noticed that NUT doesn't notice short power outages or
glitches like previous setups (APC Smart-UPS+Apcupsd) I've had. I do miss
that. I thought NUT supported UPS notifications/interrupts so that it can know
of power events that happen in-between the regular poll interval. But I
don't how much the lack of reporting is due to NUT or the Eaton UPS's I
am using now.

The default setting in upsmon.conf for ONLINE/ONBATT is SYSLOG+WALL.

If those events aren't being logged, we would need to trace further upstream
and see if the driver is generating the events.

Kris: I don't know the various debug levels off the top of my head, but if
you want to manually start the driver with a few -D flags and log the resulting
output, we can see whether or not your UPS is sending the notifications the way
that the driver is expecting them.

-- 
Charles Lepple
clepple at gmail

Charles Lepple

2013-Dec-05 00:31 UTC

head link

[Nut-upsuser] Plea for a more loquacious nut

On Dec 4, 2013, at 3:35 PM, Roger Price wrote:
> I would like nut to become more loquacious, and to log a much more complete
report of its activity.  At present nut reports that its components have started
operation but does not automatically log their activity when UPS's switch
between OB and OL.  I believe that this under-reporting of important facts is
too minimalist - it would be better for system administrators and for the nut
support team if a much more complete report were available of all OB/OL activity
by each component.
In principle, more logging sounds like a good idea. What syslog level
adjustments would you propose?
> Looking at the source code, it seems that much of what is needed is already
in place, but behind "if" conditions that ensure that little or
nothing gets through.  Long ago I wrote software, including a compiler, but my C
programming is limited to a class exercise many many years ago, and its based on
this "experience" that I'm guessing that in upssched.c function
exec_cmd the code
> 
>  snprintf(buf, sizeof(buf), "%s %s", cmdscript, cmd);
>  err = system(buf);
>  if (WIFEXITED(err)) {
> 	if (WEXITSTATUS(err)) {
> 		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
WEXITSTATUS(err));
> 	}
> 
> attempts to send a command to the operating system, possibly to execute a
Bash script.  If system(buf) fails, the tests block the error message. Surely
the error message is essential.  An unattended box is now in an emergency
situation.  After the inevitable IT failure the system should be auditable to
discover what went wrong and what should be done to prevent it happening in the
future.  Such an audit expects to find "exec_cmd(%s) returned %d" in
the log.
Are you looking for:

 * more diagnostics depending on the value of err,
 * logging of all return codes, even success

or both?
> "But these problems should be found by testing!", one might
argue. Firstly, the testing would be facilitated by this error message, and
secondly, no amount of testing will ever cover every situation met in the real
world.
> 
> I believe nut would be improved by
> 
> 1. Logging a summary of the state of the nut system and the UPS's every
24 hours.
I would personally prefer that NUT didn't do this by default. (Then again, I
don't do a lot of sysadmin work for critical systems, so take that with a
grain of salt.) To me, this seems like a call to 'upsc' should be placed
in a nightly cron job. If you have multiple UPSes, you can iterate over them. We
could add an example script to the NUT source tree for that.
> 2. Automatically logging a record of driver, upsd, upsmon and upssched
activity for each OB/OL change.
Fair point. I don't think logging at every single point is necessary, but if
it's configurable, that would work.
> 3. Replacing the upsmon NOTIFYFLAG "SYSLOG" by
"NOSYSLOG".  All notifications are logged unless the sysadmin
explicitly calls for no logging.

I suspect I am missing something here. The default upsmon.conf logs everything
to syslog (and wall) by default. Unless that part is broken (and I confess I
haven't thoroughly tested it recently), wouldn't the defaults work
without breaking existing installations? I agree that it is better to err on the
side of logging more information, but I don't think we need to break the
existing syntax to do that.

If anything, I would want finer-grained control over the syslog level for some
of these events.

-- 
Charles Lepple
clepple at gmail

Charles Lepple

2013-Dec-05 00:59 UTC

head link

[Nut-upsuser] Plea for a more loquacious nut

On Dec 4, 2013, at 7:31 PM, Charles Lepple wrote:
> On Dec 4, 2013, at 3:35 PM, Roger Price wrote:
> 
>> Looking at the source code, it seems that much of what is needed is
already in place, but behind "if" conditions that ensure that little
or nothing gets through.  Long ago I wrote software, including a compiler, but
my C programming is limited to a class exercise many many years ago, and its
based on this "experience" that I'm guessing that in upssched.c
function exec_cmd the code
>> 
>> snprintf(buf, sizeof(buf), "%s %s", cmdscript, cmd);
>> err = system(buf);
>> if (WIFEXITED(err)) {
>> 	if (WEXITSTATUS(err)) {
>> 		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
WEXITSTATUS(err));
>> 	}
>> 
>> attempts to send a command to the operating system, possibly to execute
a Bash script.  If system(buf) fails, the tests block the error message. Surely
the error message is essential. An unattended box is now in an emergency
situation.  After the inevitable IT failure the system should be auditable to
discover what went wrong and what should be done to prevent it happening in the
future.  Such an audit expects to find "exec_cmd(%s) returned %d" in
the log.
> 
> Are you looking for:
> 
> * more diagnostics depending on the value of err,
> * logging of all return codes, even success
> 
> or both?
Roger,

Did you mean "If system(buf) succeeds, the tests block the success
message"? The line just after what you quoted handles the case where
WIFEXITED(err) == 0:

https://github.com/networkupstools/nut/blob/6a4729cfdf9df01e0e3a253678c43c2594dd51b0/clients/upssched.c#L86

Code like the following should log something in all cases:

err = system(buf);
if (WIFEXITED(err)) {
	if (WEXITSTATUS(err)) {
		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
WEXITSTATUS(err));
	} else {
		upslogx(LOG_DEBUG, "exec_cmd(%s) succeeded"); /* NEW */
	}
} else {
	if (WIFSIGNALED(err)) {
		upslogx(LOG_WARNING, "exec_cmd(%s) terminated with signal %d", buf,
WTERMSIG(err));
	} else {
		upslogx(LOG_ERR, "Execute command failure: %s", buf);
	}
}

-- 
Charles Lepple
clepple at gmail

Roger Price

2013-Dec-05 09:55 UTC

head link

[Nut-upsuser] Plea for a more loquacious nut

On Wed, 4 Dec 2013, Charles Lepple wrote:
> In principle, more logging sounds like a good idea. What syslog level adjustments would you propose?

Hello Charles, I would like to increase the volume of logging to include 
all OB/OL events at each nut component.  Maybe some OB/OL logging should 
move to the right on the syslog scale (debug, info, notice, warning, err, 
crit, alert, emerg) but the risk there is that with a typical syslog.conf 
setup the messages either get lost or get walled.

An OB/OL event is comparable to a dhcpcd event and should be presented to 
the logger with equal detail.
> Are you looking for:
  * more diagnostics depending on the value of err,
  * logging of all return codes, even success

or both?

Both. Essentially the logging of all return codes. More diagnostics would 
be good to see.
> > 3. Replacing the upsmon NOTIFYFLAG "SYSLOG" by
"NOSYSLOG".
> I suspect I am missing something here. The default upsmon.conf logs everything to syslog (and wall) by default. Unless that part is broken 
(and I confess I haven't thoroughly tested it recently), wouldn't the 
defaults work without breaking existing installations? I agree that it 
is better to err on the side of logging more information, but I don't 
think we need to break the existing syntax to do that.

The idea is to make it evident by the choice of keyword that everything is 
syslogged unless the sysadmin says otherwise.  Its cosmetic.  There is no 
real change in function.  "SYSLOG" would be ignored, so old 
configurations are not broken.
> If anything, I would want finer-grained control over the syslog level for some of these events.

I agree completely.  Let nut tell the truth, the whole truth, and nothing 
but the truth, and syslog.conf will specify what the jury is to hear.
> Did you mean "If system(buf) succeeds, the tests block the success message"?

Yes, that was my (weak) understanding of the C code.
> Code like the following should log something in all cases:
err =3D system(buf);
if (WIFEXITED(err)) {
 	if (WEXITSTATUS(err)) {
 		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf, 
WEXITSTATUS(err));
 	} else {
 		upslogx(LOG_DEBUG, "exec_cmd(%s) succeeded"); /* NEW */
 	}
} else {
 	if (WIFSIGNALED(err)) {
 		upslogx(LOG_WARNING, "exec_cmd(%s) terminated with 
signal %d", buf, WTERMSIG(err));
 	} else {
 		upslogx(LOG_ERR, "Execute command failure: %s", buf);
 	}
}

Making such a clear distinction between the severities of each case will 
make it easier to set up syslog.conf.  My personal preference would be for

  upslogx(LOG_DEBUG, "exec_cmd(%s) succeeded"); /* NEW */

to be LOG_INFO.

Roger





































> On Dec 4, 2013, at 3:35 PM, Roger Price wrote:
>
>> I would like nut to become more loquacious, and to log a much more
complete report of its activity.  At present nut reports that its components
have started operation but does not automatically log their activity when
UPS's switch between OB and OL.  I believe that this under-reporting of
important facts is too minimalist - it would be better for system administrators
and for the nut support team if a much more complete report were available of
all OB/OL activity by each component.
>
> In principle, more logging sounds like a good idea. What syslog level
adjustments would you propose?
>
>> Looking at the source code, it seems that much of what is needed is
already in place, but behind "if" conditions that ensure that little
or nothing gets through.  Long ago I wrote software, including a compiler, but
my C programming is limited to a class exercise many many years ago, and its
based on this "experience" that I'm guessing that in upssched.c
function exec_cmd the code
>>
>>  snprintf(buf, sizeof(buf), "%s %s", cmdscript, cmd);
>>  err = system(buf);
>>  if (WIFEXITED(err)) {
>> 	if (WEXITSTATUS(err)) {
>> 		upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
WEXITSTATUS(err));
>> 	}
>>
>> attempts to send a command to the operating system, possibly to execute
a Bash script.  If system(buf) fails, the tests block the error message. Surely
the error message is essential.  An unattended box is now in an emergency
situation.  After the inevitable IT failure the system should be auditable to
discover what went wrong and what should be done to prevent it happening in the
future.  Such an audit expects to find "exec_cmd(%s) returned %d" in
the log.
>
> Are you looking for:
>
> * more diagnostics depending on the value of err,
> * logging of all return codes, even success
>
> or both?
>
>> "But these problems should be found by testing!", one might
argue. Firstly, the testing would be facilitated by this error message, and
secondly, no amount of testing will ever cover every situation met in the real
world.
>>
>> I believe nut would be improved by
>>
>> 1. Logging a summary of the state of the nut system and the UPS's
every 24 hours.
>
> I would personally prefer that NUT didn't do this by default. (Then
again, I don't do a lot of sysadmin work for critical systems, so take that
with a grain of salt.) To me, this seems like a call to 'upsc' should be
placed in a nightly cron job. If you have multiple UPSes, you can iterate over
them. We could add an example script to the NUT source tree for that.
>
>> 2. Automatically logging a record of driver, upsd, upsmon and upssched
activity for each OB/OL change.
>
> Fair point. I don't think logging at every single point is necessary,
but if it's configurable, that would work.
>
>> 3. Replacing the upsmon NOTIFYFLAG "SYSLOG" by
"NOSYSLOG".  All notifications are logged unless the sysadmin
explicitly calls for no logging.
>
>
> I suspect I am missing something here. The default upsmon.conf logs
everything to syslog (and wall) by default. Unless that part is broken (and I
confess I haven't thoroughly tested it recently), wouldn't the defaults
work without breaking existing installations? I agree that it is better to err
on the side of logging more information, but I don't think we need to break
the existing syntax to do that.
>
> If anything, I would want finer-grained control over the syslog level for
some of these events.
>
> -- 
> Charles Lepple
> clepple at gmail
>
>
>
>

Seemingly Similar Threads

Search for more apparently analagous threads

Nut upsuser - Dec 2013 - Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

[Nut-upsuser] Plea for a more loquacious nut

Seemingly Similar Threads