David Bolen
2011-Jun-28 22:13 UTC
[Nut-upsuser] Windows 2.6.0-1 usbhid-ups driver infinite loop on USB disconnect
I was doing some testing with the 2.6.0-1 beta Windows installer and a Cyber Power CP1350PFCLCD UPS, and ran into an infinite loop in the usbhid-ups driver if the UPS is lost on the UPS bus. This is on a Windows XP SP3 system. USB device access is via libusb 1.2.4.0 (driver mode, not filter) in case that matters. Everything seems to work just fine so far while the UPS is connected (so kudos, BTW, on the port in general), but if I pull the USB cable to the UPS during operation, usbhid-ups goes into a tight loop with errors trying to process/poll the USB port and fails to recognize the UPS is unavailable, even if the UPS is then reconnected. So once the connection is broken, no further UPS communication takes place although the rest of the system (status from upsd via upsc or upsmon) continues to see UPS data from just before the disconnect, so I'm assuming usbhid-ups is continuing to update stale data, which makes the failure mode nasty. Attached below is an excerpt of debugging output from usbhid-ups. It would appear that the interrupt and query failures have reasonably appropriate errors, so maybe it's just a difference in how such errors reflect under Windows vs. *nix. Any suggestions welcome. I haven't yet tried to build from source, but will be getting that set up, so should be able to recompile if there are specific patches to try. -- David Output from: "usbhid-ups.exe -a ups -D" 0.000000 debug level is '1' 0.000000 upsdrv_initups... 0.093750 Using subdriver: CyberPower HID 0.3 0.093750 Path: UPS.PowerSummary.iProduct, Type: Feature, ReportID: 0x01, Offset: 0, Size: 8, Value: 1 (... rest of UPS discovery parameters available if needed ...) 0.140625 Network UPS Tools - Generic HID driver 0.35 (2.6.0-2988:2989M) 0.140625 USB communication driver 0.31 0.218750 Detected a UPS: CP1350PFCLCD/CRCA102.981 0.234375 upsdrv_initinfo... 0.234375 upsdrv_updateinfo... 0.234375 Got 6 HID objects... 0.234375 Quick update... 0.234375 upsdrv_updateinfo... 0.500000 Got 0 HID objects... 0.500000 Quick update... 0.500000 upsdrv_updateinfo... (... above repeated during normal operation - varying HID object count ...) (*** I pulled the cable here ***) 4.984375 libusb_get_interrupt: libusb0-dll:err [_usb_reap_async] reaping request failed, win error: A device attached to the system is not functioning. 4.984375 Got 0 HID objects... 4.984375 Quick update... 4.984375 upsdrv_updateinfo... 5.234375 Got 0 HID objects... 5.234375 Quick update... 5.250000 upsdrv_updateinfo... 5.406250 libusb_get_interrupt: libusb0-dll:err [_usb_reap_async] reaping request failed, win error: The device is not connected. 5.406250 Got 0 HID objects... 5.406250 Quick update... 5.406250 upsdrv_updateinfo... 5.656250 Got 0 HID objects... 5.656250 Quick update... 5.671875 upsdrv_updateinfo... 5.671875 libusb_get_interrupt: libusb0-dll:err [submit_async] submitting request failed, win error: The device does not recognize the command. 5.671875 Got 0 HID objects... 5.671875 Quick update... 5.671875 upsdrv_updateinfo... 5.671875 libusb_get_interrupt: libusb0-dll:err [submit_async] submitting request failed, win error: The device does not recognize the command. 5.671875 Got 0 HID objects... 5.671875 Quick update... 5.671875 upsdrv_updateinfo... 5.671875 libusb_get_interrupt: libusb0-dll:err [submit_async] submitting request failed, win error: The device does not recognize the command. (... This repeats forever, even after reconnecting the UPS ...)
David Bolen
2011-Jun-29 00:54 UTC
[Nut-upsuser] Windows 2.6.0-1 usbhid-ups driver infinite loop on USB disconnect
David Bolen <db3l.net at gmail.com> writes:> Attached below is an excerpt of debugging output from usbhid-ups. It > would appear that the interrupt and query failures have reasonably > appropriate errors, so maybe it's just a difference in how such errors > reflect under Windows vs. *nix.Following up on my own note, I think I have a theory on what is happening, but am having trouble compiling to test a change as while I have an pre-existing Mingw/MSYS toolchain, it doesn't have a full autoconf toolchain yet. If anyone might have a pre-existing "configure" script for Win32, that would help jump start things as I could configure from there. I think I've run into two items. The first is that the errors from checking the interrupt pipe in upsdrv_updateinfo aren't actually checked for cases where they could identify a disconnected UPS, so it just assumes no information and continues. That's not Win32 specific. I'm not quite sure why it loops so quickly, as opposed to the top level polltime, but perhaps the state of the driver connection is such that it immediately satisfies the WaitForMultipleObjects call in the Win32 version of dstate_poll_fds. Anyway, normally it looks like that would be short-lived anyway since the periodic full poll would detect the disconnect a modest time later. Here I think I run into an assumption in the libhid code that errno always identifies all errors, while in the case of a win32 failure inside of libusb-win32, it's only the result code of the function calls that reflect the error, not errno. So USB I/O level errors while polling are getting hidden. After shrinking the poll time and letting things run a bit after a cable disconnect, I got a few other errors (from libusb_get_report) interleaved such as: 5.843750 Got 0 HID objects... 5.843750 Quick update... 5.859375 upsdrv_updateinfo... 5.859375 libusb_get_interrupt: libusb0-dll:err [submit_async] submitting request failed, win error: The device does not recognize the command. 5.859375 Got 0 HID objects... 5.859375 Quick update... 5.859375 libusb_get_report: libusb0-dll:err [control_msg] sending control message failed, win error: The device does not recognize the command. 5.859375 Can't retrieve Report 16: No error 5.859375 libusb_get_report: libusb0-dll:err [control_msg] sending control message failed, win error: The device does not recognize the command. 5.859375 Can't retrieve Report 15: No error 5.859375 upsdrv_updateinfo... 5.859375 libusb_get_interrupt: libusb0-dll:err [submit_async] submitting request failed, win error: The device does not recognize the command.>From looking at the libusb-win32 source, this would be the result of aWin32 ERROR_BAD_COMMAND code which is turned into a -EIO result for the usb_control_message call. But in this case, it's only the result that has the error, as it's a translation from the Windows internal error and not from any original errno value. It appears that the NUT libusb.c module passes the return code on fine, but then it's suppressed with libhid.c (refresh_report_buffer), and eventually HIDGetDataValue, who just gets a generic -1 code, assumes it can pass along -errno to the usbhid-ups driver, which at that point has no error (yielding the "No error" in the "Can't retrieve Report" message. It seems like adjusting functions along the path to all return the function result code as opposed to just -1 would work but also feels like a riskier change. I'm wondering if perhaps just making the lower level libusb entry points (probably just #if WIN32) assign the result code of the libusb-win32 driver calls to errno to ensure it represents the failures would be cleaner and help obscure the Windows differences? -- David PS: In reviewing the libusb-win32 source it also looks like in some cases it can return either EINVAL or ENOMEM, neither of which are checked for in the error checking path for usbhid-ups, so probably could get added, if only to the WIN32 block.