Hello everyone. I have two linux servers* that each have four directories mounted as SMBFS shares from a Windows 2000 Server. For the most part, this set-up is working great, however, there have been occasional hiccups. Every so often, one of the servers, LINUX-ONE, logs a couple of Samba-related errors. The following is an example: Jun 30 06:30:45 mx-two kernel: smb_trans2_request: result=-104, setting invalid Jun 30 06:30:45 mx-two kernel: smb_retry: successful, new pid=553, generation=25 I believe this is due to the network connection used by the relevant SMBFS mount shutting down because of inactivity, and then being re-established when the mount is accessed. I haven't been worrying about these messages, however, recently I have encountered even bigger issues. Sometimes when one of these "disconnects" occurs, the connection isn't always reestablished when it should be. When this happens, the mount hangs, and all processes trying to access the mount are blocked, resulting in a high load average. Anywhere from two to fifteen minutes after I've discovered the problem, it clears up on its own, and I see messages like these in the syslog: Jul 1 13:52:57 mx-two kernel: smb_get_length: recv error = 110 Jul 1 13:52:57 mx-two kernel: smb_trans2_request: result=-110, setting invalid Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/a failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/j failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/w failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/w failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_retry: successful, new pid=553, generation=44 Why does this happen? Is this a known issue with Samba 2.2.4? Also, I have yet to see any of these events on the other linux server, LINUX-TWO. I believe this is because LINUX-TWO has processes running on it that hit these mounts every five seconds, and so there is never an opportunity for the underlying network connections to become inactive. Does this make sense? If so, all I have to do to work around the problem on LINUX-ONE is set up a script that periodically pings the mounts, perhaps running an "ls" command, every so often, correct? If that is true, I need to know what the inactivity time-out limit is, so this script doesn't have to run more often than necessary. If this is covered in documentation somewhere, feel free to point me in that direction. Otherwise, any help will be greatly appreciated. Thanks! ---Kris Kelley * Red Hat 7.1, kernel 2.4.9-34 (supplied by Red Hat), Samba 2.2.4 (compiled from source with all defaults, except smbmount support was enabled)
This would appear to me as though Samba is having trouble finding the W2K server in question when it wants to re-connect. The default nameresolve order = lmhosts host wins bcast might be the cause of your problem, since host (normal unix DNS lookup) takes priority before attempting a wins search or bcast. If you have a properly configured wins server used by all the network nodes, consider placing wins before host. Alternatively, add the computer names of your W2K Servers to your /etc/hosts file for quick host lookups.. You can also change the behaviour of Windows 2000 via Registry to change the connection timeout values or disable them altogether. I don't have those registry keys on hand. In the likely event that I've completely misunderstood the problem, I apologize in advance. On Wed, 3 Jul 2002, Kris Kelley wrote: Hello everyone. I have two linux servers* that each have four directories mounted as SMBFS shares from a Windows 2000 Server. For the most part, this set-up is working great, however, there have been occasional hiccups. Every so often, one of the servers, LINUX-ONE, logs a couple of Samba-related errors. The following is an example: Jun 30 06:30:45 mx-two kernel: smb_trans2_request: result=-104, setting invalid Jun 30 06:30:45 mx-two kernel: smb_retry: successful, new pid=553, generation=25 I believe this is due to the network connection used by the relevant SMBFS mount shutting down because of inactivity, and then being re-established when the mount is accessed. I haven't been worrying about these messages, however, recently I have encountered even bigger issues. Sometimes when one of these "disconnects" occurs, the connection isn't always reestablished when it should be. When this happens, the mount hangs, and all processes trying to access the mount are blocked, resulting in a high load average. Anywhere from two to fifteen minutes after I've discovered the problem, it clears up on its own, and I see messages like these in the syslog: Jul 1 13:52:57 mx-two kernel: smb_get_length: recv error = 110 Jul 1 13:52:57 mx-two kernel: smb_trans2_request: result=-110, setting invalid Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/a failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/j failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/c failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/w failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/w failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_lookup: find archive/s failed, error=-5 Jul 1 13:52:57 mx-two kernel: smb_retry: successful, new pid=553, generation=44 Why does this happen? Is this a known issue with Samba 2.2.4? Also, I have yet to see any of these events on the other linux server, LINUX-TWO. I believe this is because LINUX-TWO has processes running on it that hit these mounts every five seconds, and so there is never an opportunity for the underlying network connections to become inactive. Does this make sense? If so, all I have to do to work around the problem on LINUX-ONE is set up a script that periodically pings the mounts, perhaps running an "ls" command, every so often, correct? If that is true, I need to know what the inactivity time-out limit is, so this script doesn't have to run more often than necessary. If this is covered in documentation somewhere, feel free to point me in that direction. Otherwise, any help will be greatly appreciated. Thanks! ---Kris Kelley * Red Hat 7.1, kernel 2.4.9-34 (supplied by Red Hat), Samba 2.2.4 (compiled from source with all defaults, except smbmount support was enabled)
On Wed, 3 Jul 2002, Kris Kelley wrote:> Every so often, one of the servers, LINUX-ONE, logs a couple of > Samba-related errors. The following is an example: > > Jun 30 06:30:45 mx-two kernel: smb_trans2_request: result=-104, > setting invalid > Jun 30 06:30:45 mx-two kernel: smb_retry: successful, new pid=553, > generation=25Those are not really errors ... The first is saying that it detected that the tcp connection to the server was gone when trying to send. This is normal, smb servers like to do that. The second message is saying that smbmount reconnected to the server and everything is ok.> issues. Sometimes when one of these "disconnects" occurs, the > connection isn't always reestablished when it should be. When this > happens, the mount hangs, and all processes trying to access the mount > are blocked, resulting in a high load average. Anywhere from two to > fifteen minutes after I've discovered the problem, it clears up on its > own, and I see messages like these in the syslog:> Jul 1 13:52:57 mx-two kernel: smb_get_length: recv error = 110-110 is "Connection timed out" (/usr/include/asm/errno.h). Which is interesting, I don't recall having seen that from anyone. But I forget. The current smbfs version is completely single threaded on one mount and while one process is sending (and receiving) no one else can do anything. This is old code from 2.1.something (or 2.0?) when all of the kernel was like that. What has probably happened is that one request has attempted to send something. It fails, but the apparently time it takes for a -110 failure is a lot longer than a -104. Because of the single thread issue nothing happens while this is waiting so you get high load. When the request finally fails all the queued up requests get through, only to find the tcp socket closed (-5 = I/O error), until smbmount again manages to reconnect. The long delay points to another problem with the current smbfs socket code. It lets the network select the length of a timeout. Patches for this exists for different 2.4 and 2.2 kernels that sets the timeout for any operation to 30 seconds (user cfg). I plan to get that into 2.4.20. There is a more advanced version that should let people always interrupt processes that are sleeping while accessing smbfs and not be single threaded and thus faster with multiple accesses ... for 2.5, eventually.> Why does this happen? Is this a known issue with Samba 2.2.4?Yes, with the kernel, nothing to do with samba.> Also, I have yet to see any of these events on the other linux server, > LINUX-TWO. I believe this is because LINUX-TWO has processes running on > it that hit these mounts every five seconds, and so there is never an > opportunity for the underlying network connections to become inactive. > Does this make sense? If so, all I have to do to work around the > problem on LINUX-ONE is set up a script that periodically pings the > mounts, perhaps running an "ls" command, every so often, correct? IfYes. There will eventually be similar code inside smbfs to do whatever it needs to keep the connection up while mounted.> that is true, I need to know what the inactivity time-out limit is, so > this script doesn't have to run more often than necessary.It's a server side setting. I believe NT (and win2k?) uses: HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\autodisconnect -1 to 65535 minutes I think the default is something like 10 minutes. 5 minutes sounds good. /Urban