Using 2.2.1a and a 2.4 kernel patched for win4lin. Everything is working but: I mount all the available network shares on my linux box as root to one network directory. If a windows client shuts down while the linux box is still connected, the share is not removed from my mount list or from /proc/mounts It cannot be removed with umount. I can kill the smbmount with kill -15 but the share is still there with mount. Telinit to 1 does not remove the share. Attempts to access the share after the windows client shuts down results in the process hanging. When running KDE, the icons all disappear after I shunt down the offending window where I tried to access the share. I cannot reboot successfully after this, with none of the other drives being umounted successfully, unless I do it by hand before I reboot. The rebooting process just hangs during shutdown. None of the other mounts are umounted during this reboot, and all have to be fsck'ed during the next reboot. This takes about 40 minutes. In short, this is a real nuisance. Any insight appreciated. Joel
Urban Widmark
2001-Dec-04 00:55 UTC
smbmounts hang around after windows client disconnects
On Thu, 29 Nov 2001, Joel Hammer wrote:> Using 2.2.1a and a 2.4 kernel patched for win4lin.Which 2.4.x? Later versions (x > 13?) does not try to disconnect cleanly and should work better at shutdown.> If a windows client shuts down while the linux box is still connected, the > share is not removed from my mount list or from /proc/mountsIt is probbaly waiting for the server end to reply to it's disconnect message. Later versions have temporarily lost the capability to send disconnect messages.> It cannot be removed with umount. I can kill the smbmount with kill -15 but > the share is still there with mount. > Telinit to 1 does not remove the share. > Attempts to access the share after the windows client shuts down results in > the process hanging.If you wait long enough I think tcp will timeout (take a shower, eat breakfast - that sort of wait), at least that is what it does for me. When shutting down perhaps you can't see that, or maybe there are additional retries. Killing smbmount is never a good idea. That is not where it is waiting (not for too long anyway). The problem of smbfs blocking when the other end disappears is known. It's a bug in the smbfs socket code where it (basiclly) assumes it will receive a reply if it sends a request. It does timeout, but it takes far too long. I'm working on a better version. /Urban
Ducking in onto this thread seems like a better idea than starting a new one: I also have a continuing problem with SMBMOUNT (at least I think it is smbmount). I had thought it was due to my inadvertently using the one supplied with RH6.2 rather than the compiled 2.2.2 version. However I still get hangs whilst copying files. I guess my problem is different in that the Windows client is NOT terminating the share (at least I don't think it is - the share I have mounted is from a Windows 2000 server SP2 running Netware Gateway Services and it has in turn mounted a Netware share to offer out). The copying process seems to hit a particular file (not necessarily big - one it kept hanging on was a 5K GIF) and then the process freezes. A "kill -9" is then required and the share becomes useable again after a load of SMB messages are spewed onto the console. Try the copy again and it stops at the same file again. Delete/move that file and it will copy quite happily gigabytes before fouling again. Note that I don't actually kill the smbmount process but the copying process - once the copying process is dead then the smbmount share becomes available again. Reading that back, the chain of mounts sounds pretty monstrous - but I can see no reason why it should not work. If I copy using Windows Explorer from the NWGS machine to Samba it works fine but is slow (all the graphical nonsense ?). I am mounting the share like this: /samba/bin/smbmount //sancho/hamnw5shared /tmp/sancho -o username=nkelly,password=xxxxxx,ro Does anyone have any suggestions here ? It would be superb for us if we could use smbmount reliably. I read that mounting the share using automount might help ? I must admit that I have not waited as long as Urban seems to suggest (1hr+) but have given it a good 15mins today with no result and ended up killing it. Thanks, Noel -----Original Message----- From: Urban Widmark [mailto:urban@teststation.com] Sent: 04 December 2001 08:53 To: Joel Hammer Cc: samba@lists.samba.org Subject: Re: smbmounts hang around after windows client disconnects On Thu, 29 Nov 2001, Joel Hammer wrote:> Using 2.2.1a and a 2.4 kernel patched for win4lin.Which 2.4.x? Later versions (x > 13?) does not try to disconnect cleanly and should work better at shutdown.> If a windows client shuts down while the linux box is still connected, the > share is not removed from my mount list or from /proc/mountsIt is probbaly waiting for the server end to reply to it's disconnect message. Later versions have temporarily lost the capability to send disconnect messages.> It cannot be removed with umount. I can kill the smbmount with kill -15but> the share is still there with mount. > Telinit to 1 does not remove the share. > Attempts to access the share after the windows client shuts down resultsin> the process hanging.If you wait long enough I think tcp will timeout (take a shower, eat breakfast - that sort of wait), at least that is what it does for me. When shutting down perhaps you can't see that, or maybe there are additional retries. Killing smbmount is never a good idea. That is not where it is waiting (not for too long anyway). The problem of smbfs blocking when the other end disappears is known. It's a bug in the smbfs socket code where it (basiclly) assumes it will receive a reply if it sends a request. It does timeout, but it takes far too long. I'm working on a better version. /Urban -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Yes I have tried that thanks Rashkae. Interestingly when it is hanging their is an awful lot of continuing network traffic between the two machines - but nothing is getting copied. Could it be some sort of looping ? Here is an excerpt from the network traffic capture during the current hang if anyone can see something odd: 08:50:40.069556 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 1090511959:1090513407(1448) ack 2075474603 win 16464 <nop,nop,timestamp 18636847 12251197>>>> NBT (DF) 08:50:40.069679 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 1448:2896(1448) ack 1 win 16464 <nop,nop,timestamp 18636847 12251197>>>> NBT (DF) 08:50:40.069713 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 2896 win 31856 <nop,nop,timestamp 12251203 18636847> (DF) 08:50:40.069803 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 2896:4344(1448) ack 1 win 16464 <nop,nop,timestamp 18636847 12251197>>>> NBT (DF) 08:50:40.094894 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 4344 win 31856 <nop,nop,timestamp 12251206 18636847> (DF) 08:50:40.095290 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 5792:7240(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.095329 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 4344 win 31856 <nop,nop,timestamp 12251206 18636847,nop,nop, sack 1 {5792:7240} > (DF) 08:50:40.095413 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 7240:8688(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.095449 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 4344 win 31856 <nop,nop,timestamp 12251206 18636847,nop,nop, sack 1 {5792:8688} > (DF) 08:50:40.095818 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 10136:11584(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.095854 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 4344 win 31856 <nop,nop,timestamp 12251206 18636847,nop,nop, sack 2 {10136:11584}{5792:8688} > (DF) 08:50:40.096230 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 4344:5792(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.096267 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 8688 win 28960 <nop,nop,timestamp 12251206 18636848,nop,nop, sack 1 {10136:11584} > (DF) 08:50:40.096353 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 8688:10136(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.096390 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 11584 win 30408 <nop,nop,timestamp 12251206 18636848> (DF) 08:50:40.096760 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 11584:13032(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.096883 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 13032:14480(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.096916 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 1:1(0) ack 14480 win 31856 <nop,nop,timestamp 12251206 18636848> (DF) 08:50:40.097284 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 14480:15928(1448) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.097333 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: P 15928:16444(516) ack 1 win 16464 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.097427 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: P 1:89(88) ack 16444 win 31856 <nop,nop,timestamp 12251206 18636848>>>> NBT (DF) 08:50:40.179831 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 16444:17892(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.179954 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 17892:19340(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.179988 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 19340 win 31856 <nop,nop,timestamp 12251214 18636848> (DF) 08:50:40.180076 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 19340:20788(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251206>>>> NBT (DF) 08:50:40.180369 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 20788:22236(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251214>>>> NBT (DF) 08:50:40.180402 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 22236 win 31856 <nop,nop,timestamp 12251214 18636848> (DF) 08:50:40.180491 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 22236:23684(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251214>>>> NBT (DF) 08:50:40.180776 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 23684:25132(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251214>>>> NBT (DF) 08:50:40.180812 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 25132 win 31856 <nop,nop,timestamp 12251214 18636848> (DF) 08:50:40.180898 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 25132:26580(1448) ack 89 win 16376 <nop,nop,timestamp 18636848 12251214>>>> NBT (DF) 08:50:40.194891 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 26580 win 31856 <nop,nop,timestamp 12251216 18636848> (DF) 08:50:40.195272 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 28028:29476(1448) ack 89 win 16376 <nop,nop,timestamp 18636849 12251216>>>> NBT (DF) 08:50:40.195312 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 26580 win 31856 <nop,nop,timestamp 12251216 18636848,nop,nop, sack 1 {28028:29476} > (DF) 08:50:40.195393 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 29476:30924(1448) ack 89 win 16376 <nop,nop,timestamp 18636849 12251216>>>> NBT (DF) 08:50:40.195427 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 26580 win 31856 <nop,nop,timestamp 12251216 18636848,nop,nop, sack 1 {28028:30924} > (DF) 08:50:40.195640 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: P 32372:32888(516) ack 89 win 16376 <nop,nop,timestamp 18636849 12251216>>>> NBT (DF) 08:50:40.195671 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 26580 win 31856 <nop,nop,timestamp 12251216 18636848,nop,nop, sack 2 {32372:32888}{28028:30924} > (DF) 08:50:40.196048 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 26580:28028(1448) ack 89 win 16376 <nop,nop,timestamp 18636849 12251216>>>> NBT (DF) 08:50:40.196084 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 89:89(0) ack 30924 win 28960 <nop,nop,timestamp 12251216 18636849,nop,nop, sack 1 {32372:32888} > (DF) 08:50:40.196171 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 30924:32372(1448) ack 89 win 16376 <nop,nop,timestamp 18636849 12251216>>>> NBT (DF) 08:50:40.196281 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: P 89:177(88) ack 32888 win 31856 <nop,nop,timestamp 12251216 18636849>>>> NBT (DF) 08:50:40.393655 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 32888:32888(0) ack 177 win 16288 <nop,nop,timestamp 18636851 12251216> (DF) 08:50:40.462695 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 32888:34336(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251216>>>> NBT (DF) 08:50:40.462816 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 34336:35784(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251216>>>> NBT (DF) 08:50:40.462863 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 177:177(0) ack 35784 win 31856 <nop,nop,timestamp 12251242 18636851> (DF) 08:50:40.463260 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 35784:37232(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.463383 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 37232:38680(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.463417 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 177:177(0) ack 38680 win 31856 <nop,nop,timestamp 12251242 18636851> (DF) 08:50:40.463506 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 38680:40128(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.463788 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 40128:41576(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.463820 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 177:177(0) ack 41576 win 31856 <nop,nop,timestamp 12251242 18636851> (DF) 08:50:40.463910 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 41576:43024(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.464194 eth0 < 192.168.5.14.netbios-ssn > 192.168.5.5.1486: . 43024:44472(1448) ack 177 win 16288 <nop,nop,timestamp 18636851 12251242>>>> NBT (DF) 08:50:40.464229 eth0 > 192.168.5.5.1486 > 192.168.5.14.netbios-ssn: . 177:177(0) ack 44472 win 31856 <nop,nop,timestamp 12251242 18636851> (DF) -----Original Message----- From: Rashkae [mailto:git@meaford.com] Sent: Wednesday, December 05, 2001 1:53 AM To: Noel Kelly Cc: samba@lists.samba.org Subject: RE: smbmounts hang around after windows client disconnects Just a thought,, but have you tried forcing a complete fschk of your filesystems on your RedHat box? I think shutdown -rF should do it. On Tue, 4 Dec 2001, Noel Kelly wrote: Ducking in onto this thread seems like a better idea than starting a new one: I also have a continuing problem with SMBMOUNT (at least I think it is smbmount). I had thought it was due to my inadvertently using the one supplied with RH6.2 rather than the compiled 2.2.2 version. However I still get hangs whilst copying files. I guess my problem is different in that the Windows client is NOT terminating the share (at least I don't think it is - the share I have mounted is from a Windows 2000 server SP2 running Netware Gateway Services and it has in turn mounted a Netware share to offer out). The copying process seems to hit a particular file (not necessarily big - one it kept hanging on was a 5K GIF) and then the process freezes. A "kill -9" is then required and the share becomes useable again after a load of SMB messages are spewed onto the console. Try the copy again and it stops at the same file again. Delete/move that file and it will copy quite happily gigabytes before fouling again. Note that I don't actually kill the smbmount process but the copying process - once the copying process is dead then the smbmount share becomes available again. Reading that back, the chain of mounts sounds pretty monstrous - but I can see no reason why it should not work. If I copy using Windows Explorer from the NWGS machine to Samba it works fine but is slow (all the graphical nonsense ?). I am mounting the share like this: /samba/bin/smbmount //sancho/hamnw5shared /tmp/sancho -o username=nkelly,password=xxxxxx,ro Does anyone have any suggestions here ? It would be superb for us if we could use smbmount reliably. I read that mounting the share using automount might help ? I must admit that I have not waited as long as Urban seems to suggest (1hr+) but have given it a good 15mins today with no result and ended up killing it. Thanks, Noel -----Original Message----- From: Urban Widmark [mailto:urban@teststation.com] Sent: 04 December 2001 08:53 To: Joel Hammer Cc: samba@lists.samba.org Subject: Re: smbmounts hang around after windows client disconnects On Thu, 29 Nov 2001, Joel Hammer wrote:> Using 2.2.1a and a 2.4 kernel patched for win4lin.Which 2.4.x? Later versions (x > 13?) does not try to disconnect cleanly and should work better at shutdown.> If a windows client shuts down while the linux box is still connected, the > share is not removed from my mount list or from /proc/mountsIt is probbaly waiting for the server end to reply to it's disconnect message. Later versions have temporarily lost the capability to send disconnect messages.> It cannot be removed with umount. I can kill the smbmount with kill -15but> the share is still there with mount. > Telinit to 1 does not remove the share. > Attempts to access the share after the windows client shuts down resultsin> the process hanging.If you wait long enough I think tcp will timeout (take a shower, eat breakfast - that sort of wait), at least that is what it does for me. When shutting down perhaps you can't see that, or maybe there are additional retries. Killing smbmount is never a good idea. That is not where it is waiting (not for too long anyway). The problem of smbfs blocking when the other end disappears is known. It's a bug in the smbfs socket code where it (basiclly) assumes it will receive a reply if it sends a request. It does timeout, but it takes far too long. I'm working on a better version. /Urban -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba