Jeff Davis
2007-Jan-31 19:39 UTC
send() returns error even though data is sent, TCP connection still alive
I am on FreeBSD 6.1 and I'm seeing write() return EHOSTDOWN while keeping the connection alive. I wrote a simple C client on the affected FreeBSD box to write a series of integers to a server program on another machine. When the client's write receives an the EHOSTDOWN, the data it sent arrives on the server program anyway. Moreover, when I write() again on the same socket, the data goes through as if nothing ever happened without further errors. The connection is not broken by the EHOSTDOWN, and the client never knows the difference. In fact, if the application just ignores the error from write() everything appears fine after that. The simplest way to see the problem is with SSH. Machine A is a freebsd box, and machine B is another box on the same switch. (1) ssh from A to B (2) see on A that "arp -a" shows the entry for B (3) on A do "arp -d B" (4) pull network cable (5) type <return> to try to send data over the SSH session (of course nothing will happen, the network cable is still out) (6) after the network cable has been unplugged for about 8 seconds, plug it back in (7) type in the SSH session again You should see something like "write failed: host is down" and the session will terminate. Of course, when ssh exits, the TCP connection closes. The only way to see that it's still open and active is by writing (or using) an application that ignores EHOSTDOWN errors from write(). I think some scripting languages do not generate an exception in that case. This is very strange behavior and it's causing all kinds of problems on our network. Does anyone have an explanation for this? Why would a TCP operation return an error without closing the connection and send the data anyway? This has existed for a long time. I believe this is related to: http://www.freebsd.org/cgi/query-pr.cgi?pr=100172 which is related to: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c? only_with_tag=RELENG_6#rev1.137.2.5 I tried the patch here: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c? f=h#rev1.158 (rev 1.158) but I can still generate the error I mentioned. Also, what's even more strange is that I set arp to be static on the production machine, and I am still getting EHOSTDOWNs. Regards, Jeff Davis
Garrett Wollman
2007-Jan-31 20:40 UTC
send() returns error even though data is sent, TCP connection still alive
In article <1170269163.22436.71.camel@dogma.v10.wvs>, Jeff Davis <freebsd@j-davis.com> wrote:>You should see something like "write failed: host is down" and the >session will terminate. Of course, when ssh exits, the TCP connection >closes. The only way to see that it's still open and active is by >writing (or using) an application that ignores EHOSTDOWN errors from >write().I agree that it's a bug. The only time write() on a stream socket should return the asynchronous error[1] is when the connection has been (or is in the process of being) torn down as a result of a subsequent timeout. POSIX says "may fail" for these errors write() and send() on sockets -GAWollman [1] There are two kinds of error returns in the socket model: synchronous errors, like synchronous signals, are attributed to the result of a specific system call, detected prior to syscall return, and usually represent programming or user error (e.g., attempting to connect() on an fd that is not a socket). Asynchronous errors are detected asynchronously, and merely posted to the socket without being delivered; they may be delivered on the next socket operation. See XSH 2.10.10, "Pending Error".