J J Smith
2013-Sep-01 14:12 UTC
[Nut-upsuser] System Shutting Down Shortly After Boot, COMMBAD, "Writing Error", "Data Receiving Error"
I've been having a problem with my system during boot, which results in a upsd/upsmon COMMBAD event shutting the system down shortly after it is started. The shutdown is dependent upon the number of processes started at boot time, and their load. Fewer processes and lighter load, and there is no shutdown. More processes and heavy load cause upsd/upsmon to shut the system down shortly after startup. The messages in syslog vary. Sometimes it's "writing error" and looks like this: Aug 29 09:42:07 brain powercom[4137]: writing error Aug 29 09:42:09 brain upsmon[4143]: Poll UPS [powercom-kin-2200ap] failed - Data stale Aug 29 09:42:09 brain powercom[4137]: writing error Aug 29 09:42:11 brain powercom[4137]: writing error Aug 29 09:42:12 brain upsmon[4143]: Poll UPS [powercom-kin-2200ap] failed - Data stale Or, sometimes it's "data receiving error" and looks like this: Aug 31 14:33:16 brain powercom[2959]: data receiving error (0 instead of 16 bytes) Aug 31 14:33:16 brain upsd[2961]: Data for UPS [powercom-kin-2200ap] is stale - check driver Aug 31 14:33:17 brain upsmon[2970]: Poll UPS [powercom-kin-2200ap] failed - Data stale Aug 31 14:33:17 brain upsmon[2970]: Communications with UPS powercom-kin-2200ap lost Aug 31 14:33:19 brain powercom[2959]: data receiving error (0 instead of 16 bytes) Aug 31 14:33:20 brain upsmon[2970]: Poll UPS [powercom-kin-2200ap] failed - Data stale Aug 31 14:33:22 brain powercom[2959]: data receiving error (0 instead of 16 bytes) The fix, thanks to SirG, was to implement the udev RUN+= rule described in his thread: http://lists.alioth.debian.org/pipermail/nut-upsuser/2012-October/007980.html In the situation described in that thread, some of the symptoms were different than mine, and the UPS hardware was different than mine, but the underlying problem was the same, and the fix was the same. The problem is that the UPS driver has to be started within a certain time period after connecting, or the UPS shuts down. The fix is to change the udev rule in 52-nut-usbups.rules. I appended , RUN+="/sbin/upsdrvctl stop; /sbin/upsdrvctl start" to the end of the rule for my UPS hardware. The result looks like this: # PowerCOM SKP - Smart KING Pro (all Smart series) - usbhid-ups ATTR{idVendor}=="0d9f", ATTR{idProduct}=="00a3", MODE="664", GROUP="nut", RUN+="/sbin/upsdrvctl stop; /sbin/upsdrvctl start" In fact, I think the symptoms and the fix are general enough that it warrants a change to all of the udev rules distributed with the package, so that all UPS devices have the RUN+= directive to start upsdrvctl. As SirG mentions in the previously sited thread, an added benefit is the ability to hotplug the UPS if one needs to rearrange USB cables (though I'm not sure anyone has tested this). One thing that needs attention, though, is how to handle a upsdrvctl command that is slow or hangs. The commands in the RUN+= directive should be detached/backgrounded. This needs a little more research. My knowledge of udev rules is limited, but I know that simply adding an ampersand '&' isn't sufficient. Maybe enclosing in parenthesis with ampersand (...) &, or brackets with ampersand {...} &, or maybe a separate helper script is required, with or without a nohup. Right now I am using the RUN rule as it is shown above, without detaching/backgrounding, and haven't had a problem (yet). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20130901/9cba4bef/attachment.html>