I don't know who this problem belongs to, samba or Linux, but it's a common one, judging by the responses in my INBOX to a previous post. Plus, it's a fairly serious one; many applications require smb shares to be mounted continuously and without interruption. It's been a recurring problem which seems to have been pushed off to the side as not important, even though it's a serious flaw. Just where, I don't know (i.e., kernel smbfs or samba/smbmount). I know there must be *someone* who has a clue. Various proposed causes include password caching, the use of uid_t instead of __kernel_uid_t in the smbumount code (what that has to do with keeping a share mounted, I'm not sure), and blistering silence from those that write the code and know it best. Pasted below is the body of a previous unanswered post. If any of the experts could help, I and others would really appreciate it. Running NT 4.0 WS SP 5 and samba 2.0.5a on Linux, 2.2.12 kernel. Whenever I mount an NT share from Linux, it times out after an indeterminate period of time. This has been a continuing problem, the only workaround being to perform something requiring disk activity on the NT box, bypassing the cache (i.e., ls > /dev/null doesn't work, but df does). This must be done on a regular basis--every two or three minutes. Error messages me2v:reliant me2v$ ls winnt ls: winnt: Input/output error from /var/log/messages: Sep 15 20:09:14 reliant kernel: smb_trans2_request: result=-32, setting invalid Sep 15 20:10:25 reliant kernel: smb_retry: signal failed, error=-3 This was after only about 1 and 1/2 hours, give or take 15 minutes. This has been a recurring problem since I moved to 2.0.3 from 1.9.18 way back when, and plenty of other people have had it, also. The typical response is that it's a password caching problem, but the password caching fixes, if any, haven't fixed the problem. I would like to know 1) Is there something in NT that could be causing this, and what that is/how to fix it, or 2) how to fix it once and for all from the Linux side (besides not using samba, that is). Or is there a smb.conf undocumented option somewhere that would help? I don't know if this is technically a Linux problem or if it's a samba problem, since historically smbmount has not been officially part of samba (although it's distributed and compiled with samba), so I wasn't sure who to post to. Hopefully, someone has found a fix, or at least knows what the problem is... Incidentally, I'm pretty sure it's not an NT service pack problem, since this has been a recurring problem with no service packs, and with SP3-5. -- Matthew Vanecek Course of Study: http://www.unt.edu/bcis Visit my Website at http://people.unt.edu/~mev0003 For answers type: perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ***************************************************************** For 93 million miles, there is nothing between the sun and my shadow except me. I'm always getting in the way of something...
Steve Rhodes wrote:> > Bad news guys, > > Almost 24 hours and I can't get smbmount to fail! I connected another > Win98 machine to the subnet, but apparently, it's not enough to cause the > smbmount problem to occur. > > I am going to try flooding the network with netbios packets and see if that > can induce failure.FWIW, it never occurs here when the net is under load. Generally, it's just when it's idle, and smbmount dies for some reason. I think that's where we need to focus--when is smbmount dying, and why? Is it a result of something NT does? /var/log/messages and /var/log/samba/* don't have a clue as to why it dies. Maybe it's a problem specific to NT? Or to NT Workstation? That's what I have, NT 4.0 WS SP5. I don't have a Windows machine to test on (thank God!! It's enough to have it at work!). Anyhow, I took the smbmount from 2.0.5a and recompiled it. In the top, I uncommented the SMBFS_DEBUG portion. Doesn't really help, I don't think, but... At the beginning of the day yesterday, I mounted a share with the DEBUG-enabled smbmount, and I mounted another one with the normal smbmount. WHen I got home from work, the DEBUG-ed share was still mounted and up (well, aside from pagefile.sys ;) ). THe normal smbmount was out like a broken lamp. df hung for a bit while it tried to probe that mount point, then I got the infamous input/output error. Here's what happened when I do a df: Sep 21 08:49:30 reliant kernel: smb_trans2_request: result=-32, setting invalid Sep 21 08:49:31 reliant kernel: smb_retry: new pid=19481, generation=7 Sep 21 08:49:31 reliant kernel: smb_lookup: find //pagefile.sys failed, error=-2 6 Sep 21 08:49:34 reliant kernel: smb_retry: signal failed, error=-3 Sep 21 08:51:50 reliant kernel: smb_retry: signal failed, error=-3 The first part, up to and including the pagefile.sys message, is from the smbmount with "#define SMBFS_DEBUG 1". The last two are from the regular smbmount, or from smbfs trying to awaken the dead normal smbmount, I guess, is more accurate. I haven't had time to decode what extra stuff gets done with SMBFS_DEBUG. There's a bunch of "#ifndef SMBFS_DEBUG"s in there, though... I'll probably try to attach a gdb to the regular smbmount tomorrow morning (no time today), and see what happens. -- Matthew Vanecek Course of Study: http://www.unt.edu/bcis Visit my Website at http://people.unt.edu/~mev0003 For answers type: perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ***************************************************************** For 93 million miles, there is nothing between the sun and my shadow except me. I'm always getting in the way of something...
I read through a bunch of posts on the subject, and I have seen a number of theories that the problem might be caused by a period of inactivity, rather than heavy network load as I earlier speculated. I thought an easy test would be to mount the NT box, then physically disconnect the smbmount client for a period of time, and see if the connection breaks. Well, I left it off the network all night, and when I re-connected and checked it this morning, it was still running! I also ran a quick test by running a ping flood over the network. The collision light on the hub was blinking madly, and I still could not induce failure! I am starting to think that it may have something to do with the configuration of the NT box. I am running this one as a Primary Domain Controller. I recall from my earlier experience that the machine in the troubled network was a Stand Alone configuration. (Un)fortunately, that machine has been re-configured as a Linux box (DHCP problems). Still trying, Steve Rhodes -----Original Message----- From: Matthew Vanecek [SMTP:mev0003@unt.edu] Sent: Tuesday, September 21, 1999 9:06 AM To: srhodes@cpinternet.com Cc: 'Urban Widmark'; 'Khimenko Victor'; linux-kernel@vger.rutgers.edu; samba@samba.org Subject: Re: Samba can't keep NT shares mounted Steve Rhodes wrote:> > Bad news guys, > > Almost 24 hours and I can't get smbmount to fail! I connected another > Win98 machine to the subnet, but apparently, it's not enough to cause the > smbmount problem to occur. > > I am going to try flooding the network with netbios packets and see ifthat> can induce failure.FWIW, it never occurs here when the net is under load. Generally, it's just when it's idle, and smbmount dies for some reason. I think that's where we need to focus--when is smbmount dying, and why? Is it a result of something NT does? /var/log/messages and /var/log/samba/* don't have a clue as to why it dies. Maybe it's a problem specific to NT? Or to NT Workstation? That's what I have, NT 4.0 WS SP5. I don't have a Windows machine to test on (thank God!! It's enough to have it at work!). Anyhow, I took the smbmount from 2.0.5a and recompiled it. In the top, I uncommented the SMBFS_DEBUG portion. Doesn't really help, I don't think, but... At the beginning of the day yesterday, I mounted a share with the DEBUG-enabled smbmount, and I mounted another one with the normal smbmount. WHen I got home from work, the DEBUG-ed share was still mounted and up (well, aside from pagefile.sys ;) ). THe normal smbmount was out like a broken lamp. df hung for a bit while it tried to probe that mount point, then I got the infamous input/output error. Here's what happened when I do a df: Sep 21 08:49:30 reliant kernel: smb_trans2_request: result=-32, setting invalid Sep 21 08:49:31 reliant kernel: smb_retry: new pid=19481, generation=7 Sep 21 08:49:31 reliant kernel: smb_lookup: find //pagefile.sys failed, error=-2 6 Sep 21 08:49:34 reliant kernel: smb_retry: signal failed, error=-3 Sep 21 08:51:50 reliant kernel: smb_retry: signal failed, error=-3 The first part, up to and including the pagefile.sys message, is from the smbmount with "#define SMBFS_DEBUG 1". The last two are from the regular smbmount, or from smbfs trying to awaken the dead normal smbmount, I guess, is more accurate. I haven't had time to decode what extra stuff gets done with SMBFS_DEBUG. There's a bunch of "#ifndef SMBFS_DEBUG"s in there, though... I'll probably try to attach a gdb to the regular smbmount tomorrow morning (no time today), and see what happens. -- Matthew Vanecek Course of Study: http://www.unt.edu/bcis Visit my Website at http://people.unt.edu/~mev0003 For answers type: perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ***************************************************************** For 93 million miles, there is nothing between the sun and my shadow except me. I'm always getting in the way of something...
I have been trying for some time to create this failure in a controlled environment. I have just now been able to experience the failure, but not quite as controlled as I would like, and certainly not the version I would prefer, as I am running 2.0.1 in this example. I have gone through a number of theories on this subject, which I though would be useful to review. Theory 1.) The failure is caused by excessive network traffic This doesn't seem to be the case. I put together a test setup in my lab and flooded the network with pings. The storm was pretty impressive, but the connection held tight. Theory 2.) The failure is caused by a period of inactivity. This may indeed be a piece of the puzzle, and seems to be one of the more popular notions going around. However, in the same laboratory setup mentioned above, I disconnected the client machine overnight, and it was still working properly upon re-connection the next day. Theory 3.) The failure is specific to a particular configuration of server. This is not the case, as I have received correspondence from a number of people with servers ranging from OS2 to NT as a PDC, all with the same problem. Theory 4.) The failure has something to do with the DEBUG option in the source code. The theory goes that the DEBUG option will work, but if it is turned off, that is where the problem starts. Something to do with attempting to write out error messages. I haven't had the opportunity to observe this directly, but it is an interesting theory. ~~~~ Having said all that, I would like to relate the configuration under which I was able to observe the failure, and present YAT (Yet Another Theory) The basic concept behind this configuration was to use files set up on an NT web server through an apache server on Linux by smbmounting the NT drive in the apache html directory. This was the original configuration in which I observed the problem earlier this year. For maximum possibility of inducing failure I set it up as an smbmount on /mnt/test, the built a symolic link to that from a /home/httpd/html/Test directory, which is in the apache document path. This way, somebody can connect to the apache server on the Linux box and view the html files which are kept on the NT box. The underlying reason for this is that the NT files are updated by a daily batch process which runs every morning. A rather complex system was built around the NT box, so it was impratical to re-build it on Linux. We needed access to those files from the Linux box for security reasons, hence the smbmount. At first, it looked like I was going to get the same result I had been experiencing throughout this process, the smbmount looked rock solid. Many hours went by, and every time I checked the connection, it was still working. However, this morning when I checked, it was broken. This leads into my new theory. I have seen a number of posts indicating that if files are changed on the smb server, that this causes issues on the mounted smb client. It wasn't entirely clear to me what those issues were, it seemed to be a lack of current data from the perspective of the client, or perhaps even a broken connection. In any event, I am speculating that the update process on the NT box in my configuration above may be the trigger that induces the failure. Most of the files are over-written during the update, and this may be a reason for the broken connection. I will be continuing to pursue this issue and narrow down the variablesassociated with the failure. Thanks to everyone that has sent in messages on this problem, and kudos to you if you have managed to read through this lengthy post. Regards, Steve Rhodes