JCA
2009-Apr-17 23:48 UTC
SCP client prints out "lost connection" error message occasionally
I am using the OpenSSH client (version 5.2p1) in a Linux box L to interact with an embedded SSH server S. When carrying out a recursive transfer from S to L by means of the scp command issued in L (S does not support sftp) the client occasionally prints out a "lost connection" error message at the very end of the transfer. After some debugging I found out that the error message (as printed out from lostconn() in scp.c) occurs because the ssh process in L, spawned by the scp command, has already terminated, but the scp command still wants to write something to the pipe it uses to communicate with this ssh process. I have observed a few things of interest here. First, the traces for the SSH server in S reveal that, in all cases (i.e. whether or not the "lost connection" error is printed out by the client) the exchange gets successfully completed. All the files that have to be transferred are transferred all right, with no data missing in the transferred files. More to the point: The traces show that the server started the closing phase by sending an exit-status SSH_MSG_CHANNEL_REQUEST message followed by an SSH_MSG_CHANNEL_EOF message and an SSH_MSG_CHANNEL_CLOSE message, to which the OpenSSH client at L replies with an SSH_MSG_CHANNEL_CLOSE message of its own: The session is closed correctly, as far as the server in S is concerned. Second, if I modify ssh.c in the OpenSSH code so that before exiting main() the program sleeps for one second, the "lost connection" error message never appears. Third, the ssh process always exits with a 0 return value. I can see this "lost connection" issue only when L and S are connected via a fast network. By this I mean that I don't see with a 100Mbps or a 10Mbps network, but I do with a 1Gbps network. Any ideas on how to characterize this further?