Merk - Oliver
2017-Jan-09 14:46 UTC
[Nut-upsuser] NUT Client shuts down when performing runtime calibration on APC UPS
Here are the results from the output of upsc under different conditions from my Debian Test system with NUT 2.7.2 : Normal: ups.status: OL No wall power: ups.status: OB Runtime calibration (started from AP9630 web interface): ups.status: OB CAL So at this stage there is a difference! I then connected to my QNAP and set the UPS settings on the web interface to "Shutdown server if power loss is 5 min", the UPS is configured to an SNMP-Connection with its IP-address. Started runtime calibration and after 5min. the QNAP NAS has shut down. But before that I connected via SSH on it and also checked the output of upsc during runtime calibration: ups.status: OB CAL. Currently I don't understand how Nut works in detail, but at least the information that it is on calibration is there. Why the NAS still shuts down after 5min. I cannot understand, but maybe you can help me. I give here the settings in the QNAP nut configuration files. I hope I've got the correct files, since they are not located at the normal place: [/mnt/HDA_ROOT/.config] # more ups_snmptrapd.conf # # use fully qualified prefix... just to be safe # traphandle .1.3.6.1.4.1.318.0.5 /sbin/ups_snmptrap_handler POWER_LOST traphandle .1.3.6.1.4.1.318.0.9 /sbin/ups_snmptrap_handler POWER_RESTORED traphandle .1.3.6.1.4.1.318.0.7 /sbin/ups_snmptrap_handler BATTERY_LOW traphandle .1.3.6.1.4.1.318.0.1 /sbin/ups_snmptrap_handler COMM_LOST traphandle .1.3.6.1.4.1.318.0.8 /sbin/ups_snmptrap_handler COMM_RESTORED ----------------------------------------------------------------------------------------------------------- [/mnt/HDA_ROOT/.config/ups] # more ups.conf [qnapups] driver = snmp-ups port = /dev/ttyS1 desc = "Workstation" pollinterval=1 ----------------------------------------------------------------------------------------------------------- [/mnt/HDA_ROOT/.config/ups] # more upsd.conf # Network UPS Tools: example upsd configuration file # # This file contains access control data, you should keep it secure. # # It should only be readable by the user that upsd becomes. See the FAQ. # ======================================================================# Access Control Lists (ACLs) # # ACL <name> <ipblock> # ACL myhost 10.0.0.1/32 # # ACCEPT <aclname> [<aclname>...] # REJECT <aclname> [<aclname>...] # # Define lists of hosts or networks with ACL definitions. # # ACCEPT and REJECT use ACL definitions to control whether a host is # allowed to connect to upsd. # # This default configuration only gives access to localhost. To allow # other hosts or networks to connect, see the documentation and change # these lines. ACL all 0.0.0.0/0 ACL localhost 127.0.0.1/32 ACCEPT localhost REJECT all MAXAGE 20 # ======================================================================# MAXAGE <seconds> # MAXAGE 15 # # This defaults to 15 seconds. After a UPS driver has stopped updating # the data for this many seconds, upsd marks it stale and stops making # that information available to clients. After all, the only thing worse # than no data is bad data. # # You should only use this if your driver has difficulties keeping # the data fresh within the normal 15 second interval. Watch the syslog # for notifications from upsd about staleness. ----------------------------------------------------------------------------------------------------------- [/mnt/HDA_ROOT/.config/ups] # more upsd.users # Network UPS Tools: Example upsd.users # # This file sets the permissions for upsd - the UPS network daemon. # Users are defined here, are given passwords, and their privileges are # controlled here too. Since this file will contain passwords, keep it # secure, with only enough permissions for upsd to read it. # -------------------------------------------------------------------------- # Each user gets a section. To start a section, put the username in # brackets on a line by itself. To set something for that user, specify # it under that section heading. The username is case-sensitive, so # admin and AdMiN are two different users. # # Possible settings: # # password: The user's password. This is case-sensitive. # # -------------------------------------------------------------------------- # # allowfrom: ACL names that this user may connect from. ACLs are # defined in upsd.conf. # # -------------------------------------------------------------------------- # # actions: Let the user do certain things with upsd. # # Valid actions are: # # SET - change the value of certain variables in the UPS # FSD - set the "forced shutdown" flag in the UPS # # -------------------------------------------------------------------------- # # instcmds: Let the user initiate specific instant commands. Use "ALL" # to grant all commands automatically. There are many possible # commands, so use 'upscmd -l' to see what your hardware supports. Here # are a few examples: # # test.panel.start - Start a front panel test # test.battery.start - Start battery test # test.battery.stop - Stop battery test # calibrate.start - Start calibration # calibrate.stop - Stop calibration # # -------------------------------------------------------------------------- # # Example: # # [admin] # password = mypass # allowfrom = admworkstation admhome # actions = SET # instcmds = ALL # [admin] password = <secret> allowfrom = localhost actions = SET instcmds = ALL upsmon master # or upsmon slave # # --- Configuring for upsmon # # To add a user for your upsmon, use this example: # # [monuser] # password = pass # allowfrom = bigserver # # upsmon master (or upsmon slave) # # The matching MONITOR line in your upsmon.conf would look like this: # # MONITOR myups at myhost 1 monuser pass master (or slave) ----------------------------------------------------------------------------------------------------------- [/mnt/HDA_ROOT/.config/ups] # more upsdrv.map version=1 0x51d,,usbhid-ups 0x463,,usbhid-ups 0x764,,usbhid-ups 0x9ae,,usbhid-ups 0x50d,,usbhid-ups 0x665,0x5161,blazer_usb 0x1d6b,0x2,NOT_UPS 0x1d6b,0x3,NOT_UPS 0x1d6b,0x1,NOT_UPS 0x1005,0xb155,NOT_UPS 0x051d,0x0002,usbhid-ups 0x051d,0x0002,usbhid-ups 0x051d,0x0002,usbhid-ups 0x051d,0x0002,usbhid-ups ----------------------------------------------------------------------------------------------------------- [/mnt/HDA_ROOT/.config/ups] # more upsmon.conf # Network UPS Tools: example upsmon configuration # # This file contains passwords, so keep it secure. # -------------------------------------------------------------------------- # RUN_AS_USER <userid> # # By default, upsmon splits into two processes. One stays as root and # waits to run the SHUTDOWNCMD. The other one switches to another userid # and does everything else. # # The default nonprivileged user is set at compile-time with # 'configure --with-user=...'. # # You can override it with '-u <user>' when starting upsmon, or just # define it here for convenience. # # Note: if you plan to use the reload feature, this file (upsmon.conf) # must be readable by this user! Since it contains passwords, DO NOT # make it world-readable. Also, do not make it writable by the upsmon # user, since it creates an opportunity for an attack by changing the # SHUTDOWNCMD to something malicious. # # For best results, you should create a new normal user like "nutmon", # and make it a member of a "nut" group or similar. Then specify it # here and grant read access to the upsmon.conf for that group. # # This user should not have write access to upsmon.conf. # # RUN_AS_USER nutmon RUN_AS_USER admin # -------------------------------------------------------------------------- # MONITOR <system> <powervalue> <username> <password> ("master"|"slave") # # List systems you want to monitor. Not all of these may supply power # to the system running upsmon, but if you want to watch it, it has to # be in this section. # # You must have at least one of these declared. # # <system> is a UPS identifier in the form <upsname>@<hostname>[:<port>] # like ups at localhost, su700 at mybox, etc. # # Examples: # # - "su700 at mybox" means a UPS called "su700" on a system called "mybox" # # - "fenton at bigbox:5678" is a UPS called "fenton" on a system called # "bigbox" which runs upsd on port "5678". # # The UPS names like "su700" and "fenton" are set in your ups.conf # in [brackets] which identify a section for a particular driver. # # If the ups.conf on host "doghouse" has a section called "snoopy", the # identifier for it would be "snoopy at doghouse". # # <powervalue> is an integer - the number of power supplies that this UPS # feeds on this system. Most computers only have one power supply, so this # is normally set to 1. You need a pretty big or special box to have any # other value here. # # You can also set this to 0 for a system that doesn't supply any power, # but you still want to monitor. Use this when you want to hear about # changes for a given UPS without shutting down when it goes critical, # unless <powervalue> is 0. # # <username> and <password> must match an entry in that system's # upsd.users. If your username is "monmaster" and your password is # "blah", the upsd.users would look like this: # # [monmaster] # password = blah # allowfrom = (whatever applies to this host) # upsmon master (or slave) # # "master" means this system will shutdown last, allowing the slaves # time to shutdown first. # # "slave" means this system shuts down immediately when power goes critical. # # Examples: # # MONITOR myups at bigserver 1 monmaster blah master # MONITOR su700 at server.example.com 1 upsmon secretpass slave MONITOR qnapups at 0.0.0.0 1 admin <secret> slave # -------------------------------------------------------------------------- # MINSUPPLIES <num> # # Give the number of power supplies that must be receiving power to keep # this system running. Most systems have one power supply, so you would # put "1" in this field. # # Large/expensive server type systems usually have more, and can run with # a few missing. The HP NetServer LH4 can run with 2 out of 4, for example, # so you'd set that to 2. The idea is to keep the box running as long # as possible, right? # # Obviously you have to put the redundant supplies on different UPS circuits # for this to make sense! See big-servers.txt in the docs subdirectory # for more information and ideas on how to use this feature. MINSUPPLIES 1 # -------------------------------------------------------------------------- # SHUTDOWNCMD "<command>" # # upsmon runs this command when the system needs to be brought down. # # This should work just about everywhere ... if it doesn't, well, change it. SHUTDOWNCMD "/sbin/shutdown -h +0" # -------------------------------------------------------------------------- # NOTIFYCMD <command> # # # upsmon calls this to send messages when things happen # # This command is called with the full text of the message as one argument. # The environment string NOTIFYTYPE will contain the type string of # whatever caused this event to happen. # # Note that this is only called for NOTIFY events that have EXEC set with # NOTIFYFLAG. See NOTIFYFLAG below for more details. # # Making this some sort of shell script might not be a bad idea. For more # information and ideas, see pager.txt in the docs directory. # # Example: # NOTIFYCMD /usr/local/ups/bin/notifyme # -------------------------------------------------------------------------- # POLLFREQ <n> # # Polling frequency for normal activities, measured in seconds. # # Adjust this to keep upsmon from flooding your network, but don't make # it too high or it may miss certain short-lived power events. POLLFREQ 5 # -------------------------------------------------------------------------- # POLLFREQALERT <n> # # Polling frequency in seconds while UPS on battery. # # You can make this number lower than POLLFREQ, which will make updates # faster when any UPS is running on battery. This is a good way to tune # network load if you have a lot of these things running. # # The default is 5 seconds for both this and POLLFREQ. POLLFREQALERT 5 # -------------------------------------------------------------------------- # HOSTSYNC - How long upsmon will wait before giving up on another upsmon # # The master upsmon process uses this number when waiting for slaves to # disconnect once it has set the forced shutdown (FSD) flag. If they # don't disconnect after this many seconds, it goes on without them. # # Similarly, upsmon slave processes wait up to this interval for the # master upsmon to set FSD when a UPS they are monitoring goes critical - # that is, on battery and low battery. If the master doesn't do its job, # the slaves will shut down anyway to avoid damage to the file systems. # # This "wait for FSD" is done to avoid races where the status changes # to critical and back between polls by the master. HOSTSYNC 15 # -------------------------------------------------------------------------- # DEADTIME - Interval to wait before declaring a stale ups "dead" # # upsmon requires a UPS to provide status information every few seconds # (see POLLFREQ and POLLFREQALERT) to keep things updated. If the status # fetch fails, the UPS is marked stale. If it stays stale for more than # DEADTIME seconds, the UPS is marked dead. # # A dead UPS that was last known to be on battery is assumed to have gone # to a low battery condition. This may force a shutdown if it is providing # a critical amount of power to your system. # # Note: DEADTIME should be a multiple of POLLFREQ and POLLFREQALERT. # Otherwise you'll have "dead" UPSes simply because upsmon isn't polling # them quickly enough. Rule of thumb: take the larger of the two # POLLFREQ values, and multiply by 3. DEADTIME 15 # -------------------------------------------------------------------------- # POWERDOWNFLAG - Flag file for forcing UPS shutdown on the master system # # upsmon will create a file with this name in master mode when it's time # to shut down the load. You should check for this file's existence in # your shutdown scripts and run 'upsdrvctl shutdown' if it exists. # # See the shutdown.txt file in the docs subdirectory for more information. POWERDOWNFLAG /etc/killpower # -------------------------------------------------------------------------- # NOTIFYMSG - change messages sent by upsmon when certain events occur # # You can change the default messages to something else if you like. # # NOTIFYMSG <notify type> "message" # # NOTIFYMSG ONLINE "UPS %s on line power" # NOTIFYMSG ONBATT "UPS %s on battery" # NOTIFYMSG LOWBATT "UPS %s battery is low" # NOTIFYMSG FSD "UPS %s: forced shutdown in progress" # NOTIFYMSG COMMOK "Communications with UPS %s established" # NOTIFYMSG COMMBAD "Communications with UPS %s lost" # NOTIFYMSG SHUTDOWN "Auto logout and shutdown proceeding" # NOTIFYMSG REPLBATT "UPS %s battery needs to be replaced" # NOTIFYMSG NOCOMM "UPS %s is unavailable" # NOTIFYMSG NOPARENT "upsmon parent process died - shutdown impossible" # # Note that %s is replaced with the identifier of the UPS in question. # # Possible values for <notify type>: # # ONLINE : UPS is back online # ONBATT : UPS is on battery # LOWBATT : UPS has a low battery (if also on battery, it's "critical") # FSD : UPS is being shutdown by the master (FSD = "Forced Shutdown") # COMMOK : Communications established with the UPS # COMMBAD : Communications lost to the UPS # SHUTDOWN : The system is being shutdown # REPLBATT : The UPS battery is bad and needs to be replaced # NOCOMM : A UPS is unavailable (can't be contacted for monitoring) # NOPARENT : The process that shuts down the system has died (shutdown impossible) # -------------------------------------------------------------------------- # NOTIFYFLAG - change behavior of upsmon when NOTIFY events occur # # By default, upsmon sends walls (global messages to all logged in users) # and writes to the syslog when things happen. You can change this. # # NOTIFYFLAG <notify type> <flag>[+<flag>][+<flag>] ... # # NOTIFYFLAG ONLINE SYSLOG+WALL # NOTIFYFLAG ONBATT SYSLOG+WALL # NOTIFYFLAG LOWBATT SYSLOG+WALL # NOTIFYFLAG FSD SYSLOG+WALL # NOTIFYFLAG COMMOK SYSLOG+WALL # NOTIFYFLAG COMMBAD SYSLOG+WALL # NOTIFYFLAG SHUTDOWN SYSLOG+WALL # NOTIFYFLAG REPLBATT SYSLOG+WALL # NOTIFYFLAG NOCOMM SYSLOG+WALL # NOTIFYFLAG NOPARENT SYSLOG+WALL # # Possible values for the flags: # # SYSLOG - Write the message in the syslog # WALL - Write the message to all users on the system # EXEC - Execute NOTIFYCMD (see above) with the message # IGNORE - Don't do anything # # If you use IGNORE, don't use any other flags on the same line. # -------------------------------------------------------------------------- # RBWARNTIME - replace battery warning time in seconds # # upsmon will normally warn you about a battery that needs to be replaced # every 43200 seconds, which is 12 hours. It does this by triggering a # NOTIFY_REPLBATT which is then handled by the usual notify structure # you've defined above. # # If this number is not to your liking, override it here. RBWARNTIME 43200 # -------------------------------------------------------------------------- # NOCOMMWARNTIME - no communications warning time in seconds # # upsmon will let you know through the usual notify system if it can't # talk to any of the UPS entries that are defined in this file. It will # trigger a NOTIFY_NOCOMM by default every 300 seconds unless you # change the interval with this directive. NOCOMMWARNTIME 300 # -------------------------------------------------------------------------- # FINALDELAY - last sleep interval before shutting down the system # # On a master, upsmon will wait this long after sending the NOTIFY_SHUTDOWN # before executing your SHUTDOWNCMD. If you need to do something in between # those events, increase this number. Remember, at this point your UPS is # almost depleted, so don't make this too high. # # Alternatively, you can set this very low so you don't wait around when # it's time to shut down. Some UPSes don't give much warning for low # battery and will require a value of 0 here for a safe shutdown. # # Note: If FINALDELAY on the slave is greater than HOSTSYNC on the master, # the master will give up waiting for the slave to disconnect. FINALDELAY 5 ________________________ This email was scanned by Bitdefender
Charles Lepple
2017-Jan-10 03:45 UTC
[Nut-upsuser] NUT Client shuts down when performing runtime calibration on APC UPS
On Jan 9, 2017, at 9:46 AM, Merk - Oliver <oliver.merk at pecon.biz> wrote:> > I then connected to my QNAP and set the UPS settings on the web interface to "Shutdown server if power loss is 5 min", the UPS is configured to an SNMP-Connection with its IP-address. Started runtime calibration and after 5min. the QNAP NAS has shut down. But before that I connected via SSH on it and also checked the output of upsc during runtime calibration: ups.status: OB CAL.With those configuration files, NUT will not shutdown the system until it sees both "OB" and "LB". I don't see anything in upsmon.conf that corresponds to "Shutdown server if power loss is 5 min" (usually this would be done with upsmon calling upssched, which would start a 5 min. timer), so is it possible that the QNAP firmware has another NUT client that is monitoring for an "OB" status that lasts for more than five minutes? Also, ups_snmptrapd.conf is not a standard NUT feature. You could use the Debian test system to check to see if the driver eventually publishes an "OB LB" status by replacing the SHUTDOWNCMD with something innocuous, like "logger" or "wall". If the Debian system does not run the SHUTDOWNCMD, I recommend filing a bug with the QNAP developers.
Merk - Oliver
2017-Jan-10 14:58 UTC
[Nut-upsuser] NUT Client shuts down when performing runtime calibration on APC UPS
In the meantime I learn more and more about NUT. And I've learned that QNAP is not using all of NUT but has instead of upsmon its own process called upsutil (https://qnap.uservoice.com/forums/213378-enterprise-features/suggestions/4225398-full-and-native-support-of-nut-ups-management) which is not supporting all the features. The configuration data for the UPS is stored in the file uLinux.conf, I think upsmon.conf is not used at all. Also the QNAP UPS implementation uses only NUT driver and daemon for signaling, but it only uses the event "change to OB" for starting a shutdown timer. It is not using the "OB LB" status, which I would definitely prefer, since it is a dynamic trigger event and works also reliable, if battery (and runtime) capacity is getting lower. For me I would like to let run the systems as long as possible on battery and not to go through a shutdown/reboot procedure without really being necessary. Just like the default behavior of NUT. I have a second system (Custom build SAN with a FreeBSD based firmware NAS4Free), that also has shut down premature during a runtime calibration. Since I cannot do tests on this system (it is the storage for our VM Servers) I need to get the firmware running in a VM for testing. I hope the implementation of NUT in this firmware is better than in the QNAP system. ________________________ This email was scanned by Bitdefender