Jim Paris
2008-Sep-24  21:21 UTC
[Nut-upsdev] [PATCH] disable nonblocking mode on serial port
Hi,
I got a new Cyberpower 1500AVR UPS and nut wouldn't work.  It failed
to detect the UPS and a strace showed that all writes to the serial
port were failing with -EAGAIN.  The attached patch disabled
nonblocking mode on the serial port and now it works fine.
Is there a reason the port was put in non-blocking mode?  There's no
code I could find to deal with the inevitable -EAGAIN that you'd
receive on writes.  Maybe the delays introduced by ser_send_pace are
why it might work for some people?
-jim
--- nut-2.2.2/drivers/serial.c	2007-09-09 15:33:15.000000000 -0400
+++ nut-2.2.2-jim/drivers/serial.c	2008-09-24 16:55:32.000000000 -0400
@@ -133,12 +133,19 @@
 int ser_open(const char *port)
 {
 	int	fd;
+	int	flags;
 
 	fd = open(port, O_RDWR | O_NOCTTY | O_EXCL | O_NONBLOCK);
 
 	if (fd < 0)
 		ser_open_error(port);
 
+	if ((flags = fcntl(fd, F_GETFL, 0)) < 0)
+		ser_open_error(port);
+
+	if (fcntl(fd, F_SETFL, flags & ~O_NONBLOCK) < 0)
+		ser_open_error(port);
+
 	lock_set(fd, port);
 
 	return fd;
Arjen de Korte
2008-Sep-26  05:36 UTC
[Nut-upsdev] [PATCH] disable nonblocking mode on serial port
Citeren "Jim Paris" <jim at jtan.com>:> I got a new Cyberpower 1500AVR UPS and nut wouldn't work. It failed > to detect the UPS and a strace showed that all writes to the serial > port were failing with -EAGAIN. The attached patch disabled > nonblocking mode on the serial port and now it works fine.What operating system are you using? Which NUT driver? If you're using the 'cyberpower' driver, switch to the 'powerpanel' driver and you can probably skip the remainder of this message. The thing you should ask yourself is why the connection is timing out in the first place. Running the driver in debug mode might reveal what is going on. If it is because we are sending it too much data and the UPS can't keep up with the data flow, this should be fixed. The same may happen if synchronization is lost, which is a frequent problem with drivers that don't properly flush in- and outputs. Using the sledgehammer approach of opening the device in blocking mode and just sit it out until it can handle whatever we throw at it is definitely not what we want and certainly not something that we will integrate into NUT without ruling out the problems already mentioned.> Is there a reason the port was put in non-blocking mode?Of course. :-)> There's no > code I could find to deal with the inevitable -EAGAIN that you'd > receive on writes. Maybe the delays introduced by ser_send_pace are > why it might work for some people?Opening a port in non-blocking mode allows us to deal with errors that we may encounter (in a timely manner). If you open a device in blocking mode, it may take a relatively long time before we notice that something is wrong for some error conditions. The timeout depends on the system settings, but 15 minutes is not uncommon. While this will be dealt with while reading data (through select() calls, so these never risk blocking), this is not the case when writing to the device. If the UPS happens to be on battery at that time, it will probably be empty by the time the call times out and the system will be shutdown hard. Your patch makes little sense, since just removing O_NONBLOCK from the flags in the call to open() should be sufficient already. However, if we should ever decide that in order to use the Cyberpower 1500AVR UPS we need to open it in blocking mode, clearing the O_NONBLOCK flag should be done by the driver. By doing it here, you risk breaking all other drivers that use this library. Best regards, Arjen -- Please keep list traffic on the list
Arjen de Korte
2008-Oct-06  19:40 UTC
[Nut-upsdev] [PATCH] disable nonblocking mode on serial port
Citeren Jim Paris <jim at jtan.com>:>> Because the whole purpose of using NUT is that you must have a reliable >> connection to the UPS. There is no point in monitoring a UPS if you don't >> have low (essentially zero) latency. > I don't follow. I don't expect 100% reliability (there's a reason NUT > has a "connection lost" message) and I still see a point in monitoring > my UPS even if the status updates have a 250ms latency.It's not a latency of 250ms per polling sequence that I'm worried about, it's the possible delay involved for each character that we write. If I understand you correctly, all writes to the USB to RS-232 converter are failing right now. That would mean that we shouldn't only expect additional delays when sending the first character of a command, but also for subsequent characters. Since we have some fairly verbose protocols in the mix of supported devices, this will inevitably lead to timeout problems if the required grace period for a select() call is more than a couple of milliseconds. According to the comments in the header, the Cypress M8 kernel module contains the same write buffering as the PL2303. I still feel that this is an essential part of a USB to RS-232 converter. If it fails to provide that, it is broken. Neither as one expects EAGAIN after writing to /dev/null, nor should we when writing to a UART (knowing that the transmit buffer is empty). Last but not least, there is still the problem that *any* USB to RS-232 converter introduces the problem that we no longer are in control of the delays between characters. Because of the latency of the USB bus, even if characters are sent with delays, this delay may be lost on the RS-232 side of the converter. This may not be a problem for your UPS, but certainly doesn't make this a solution for the general case. Testing any changes in protocol timing are incredibly difficult, if possible at all. The NUT developers have access to only a limited amount of devices to test with, so any changes require very careful examination for possible side effects. Even a small delay added, may make all the difference. Best regards, Arjen PS By the way, we have a select_write() call in common/common.c already. Especially in the trunk, this takes only one line of code in drivers/serial.c (line #299). -- Please keep list traffic on the list