john.debella@teradyne.com
2004-Dec-28 22:24 UTC
[Samba] Re: smbd blocking in receive_smb (how to trigger procedure - LONG)
Hi, We've been experiencing this same problem, however, with Samba 3.0.7-1.3E.1 running on RedHat ES 3.0. We've also experienced this on 2.2.7x on RedHat 9. The problem has been intermittently happening over the last several weeks, (I've posted list about it). This message, along with the others in the thread, describes the symptoms that we're seeing, however, I haven't been able to determine whether this has been fixed or if there is a fix. There were responses in the thread that indicated using the "dead time" parameter to kill off dead processes. Does anyone know if this has been fixed or have any idea whether tunning the "dead time" parameter or some other thing has resolved this? Any help would be appreciated. -John List: samba-technical Subject: Re: smbd blocking in receive_smb (how to trigger procedure - LONG) From: Pascal <pascal () vmfacility ! fr> Date: 2002-08-06 18:29:39 [Download message RAW] I think I finally found a BUG (?) in samba 2.2.5 and now I found a means to reproduce that kind of problem without incriminating any network problem at all (I guess...). But I need a FIFO file on Unix in order to trigger it very easilly. I'd like to know if other (especially SAMBA DEVELOPPERs) can reproduce it. So this is the procedure : 1. take a samba server 2.2.5 (I have mandrake cooker samba-server-2.2.5-14mdk) on a linux kernel version 2.4.18-23mdk 2. samba config : # Global parameters [global] workgroup = VMF netbios name = MYNAME server string = VMF Samba Server %v on %h interfaces = xx.xx.xx.xx/24 yy.yy.yy.yy/24 bind interfaces only = Yes encrypt passwords = Yes map to guest = Bad User syslog = 10 log file = /data/reports/xxxxxxx/smblog-%m.txt max log size = 500 name resolve order = wins host bcast unix extensions = Yes socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 SO_KEEPALIVE os level = 65 lm announce = False preferred master = True domain master = True wins support = Yes message command = /usr/bin/mail -s 'winpopup msg from %f on %m' root < %s; rm %s time offset = 60 panic action = echo samba crash | mail -s 'samba crash' root ;admin users = pascal ivan printer admin = pascal ivan hosts allow = xx.xx.xx.xx/255.255.255.0 yy.yy.yy.yy/255.255.255.0 printing = cups # added by PC 23/07/2002 kernel oplocks = no oplock break wait time = 100 # added by PC 05/08/2002 deadtime = 15 [print$] path = /data/home/smbprinters write list = root pascal ivan [printers] comment = All Printers path = /var/spool/samba create mask = 0700 guest ok = Yes printable = Yes writable = No print command = lpr-cups -P %p -o raw %s -r # using client side printer drivers. lpq command = lpstat -o %p lprm command = cancel %p-%j browseable = No [Docs] comment = Documents VMF path = /data/docpasc valid users = pascal ivan force user = docs force group = docs read only = No create mask = 0664 force create mode = 0664 directory mask = 0750 3. I create a unix FIFO on /data/docpasc mkfifo /data/docpasc/FIFO prw-rw-r-- 1 pascal docs 0 Aug 6 19:26 /data/docpasc/FIFO| 4. take a windows client (win2k, win98se tested) go to the explorer and open the Docs share now click on the fifo name and right-clic to get the menu Your client is frozen 5. go to samba server and try to figure out what happens, here is the log : # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:46 2002 Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 No locked files # tdbtool is used to check I can read the entire lockfile # ./tdbtool /var/cache/samba/locking.tdb tdb> list freelist: tdb> quit # # client will now open explorer and look at the properties of a FIFO (unix) File on the Docs share # now client is frozen # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6 18:21:51 2002 pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6 18:21:51 2002 unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:46 2002 Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 Locked files: Pid DenyMode Access R/W Oplock Name -------------------------------------------------------------- 2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH /data/docpasc/Folder Settings/Folder.htt Tue Aug 6 18:21:31 2002 2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH /data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002 # # as you see there is already 2 process for the same machine ! # date Tue Aug 6 19:22:37 CEST 2002 # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6 18:21:51 2002 pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 IPC$ nobody 65534 2699 pcpcwin (212.208.17.83) Tue Aug 6 18:21:51 2002 unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:46 2002 Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 Locked files: Pid DenyMode Access R/W Oplock Name -------------------------------------------------------------- 2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH /data/docpasc/Folder Settings/Folder.htt Tue Aug 6 18:21:31 2002 2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH /data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002 # strace -p 2699 select(21, [5 13 20], NULL, NULL, {44, 880000} <unfinished ...> # strace -p 2675 open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...> # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:46 2002 Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 Locked files: Pid DenyMode Access R/W Oplock Name -------------------------------------------------------------- 2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH /data/docpasc/Folder Settings/Folder.htt Tue Aug 6 18:21:31 2002 2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH /data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002 # strace -p 2709 fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} <unfinished ...> # # now all smbd that will start will be in a lock in the locking.tdb # lsof -p 2709 | grep locking.tdb smbd 2709 docs mem-r REG 3,5 8192 38698 /var/cache/samba/locking.tdb smbd 2709 docs 10ur REG 3,5 8192 38698 /var/cache/samba/locking.tdb # # now we cannot access the locking.tdb with tdbtool # ./tdbtool /var/cache/samba/locking.tdb tdb> list hash=11 rec: offset=856 next=0 rec_len=152 key_len=16 data_len=129 full_hash=0xcc182ae9 magic=0x26011999 # ./tdbtool /var/cache/samba/locking.tdb tdb> dump key 16 bytes ............... data 129 bytes [000] 01 00 00 00 E8 3D 29 40 A3 BF 00 00 00 00 00 00 .....=)@ ....... [010] 01 00 00 00 F0 38 26 08 66 00 00 00 00 00 00 00 .....8&. f...... [020] 00 00 00 00 00 00 00 00 00 00 00 00 73 0A 00 00 ....... ....s.. [030] 11 80 03 00 40 00 00 00 01 00 00 00 9B 05 50 3D ....@.. ......P[040] 4F E8 01 00 00 09 00 00 00 00 00 00 A5 BF 00 00 O...... ....... [050] 00 00 00 00 02 00 00 00 2F 64 61 74 61 2F 64 6F ....... /data/do [060] 63 70 61 73 63 2F 46 6F 6C 64 65 72 20 53 65 74 cpasc/Fo lder Set [070] 74 69 6E 67 73 2F 46 6F 6C 64 65 72 2E 68 74 74 tings/Fo lder.htt [080] 00 # # i was to CTRL-C the tool to get out # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- pascal pascal pascal 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 unsaved pascal vmf-staff 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:46 2002 Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2675 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:19:45 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 Locked files: Pid DenyMode Access R/W Oplock Name -------------------------------------------------------------- 2675 DENY_NONE 0x1 RDONLY EXCLUSIVE+BATCH /data/docpasc/Folder Settings/Folder.htt Tue Aug 6 18:21:31 2002 2675 DENY_WRITE 0x3 RDWR EXCLUSIVE+BATCH /data/docpasc/desktop.ini Tue Aug 6 18:21:31 2002 # # now i write something to the pipe # echo coucou > /data/docpasc/FIFO # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # ./tdbtool /var/cache/samba/locking.tdb tdb> list # # some cleanup has been done but the tdb is still locked # strace -p 2720 select(21, [5 13 20], NULL, NULL, {39, 820000} <unfinished ...> # strace -p 2709 open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...> # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # # client is not blocked and the right-clic panel of the FIFO file is opened # # I select propertied # # I select properties # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2775 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:29:19 2002 Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # # now I have another smbd ! # strace -p 2775 fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} <unfinished ...> # strace -p 2720 fcntl64(10, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=344, len=1} <unfinished ...> # strace -p 2709 open("FIFO", O_RDONLY|O_LARGEFILE <unfinished ...> # # on still trry to open the pipe # # the others are in the locking.tdb lock, waiting for it to be released # smbclient \\\\pumpkin\\Docs -I 192.168.0.3 -U ivan added interface ip=192.168.0.3 bcast=192.168.0.255 nmask=255.255.255.0 Password: Domain=[XXX] OS=[Unix] Server=[Samba 2.2.5] smb: \> ls . D 0 Tue Aug 6 15:34:07 2002 .. D 0 Mon Feb 25 19:54:32 2002 FIFO 0 Tue Aug 6 18:26:28 2002 55700 blocks of size 2097152. 37394 blocks available smb: \> quit # echo dump | ./tdbtool /var/cache/samba/locking.tdb # still locked # smbstatus Samba version 2.2.5 Service uid gid pid machine ---------------------------------------------- Docs docs docs 2790 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:30:40 2002 Docs docs docs 2802 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:31:48 2002 Docs docs docs 2775 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:29:19 2002 Docs docs docs 2720 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:24:00 2002 Docs docs docs 2813 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:32:43 2002 Docs docs docs 2709 ivpcvmwin98se (192.168.0.126) Tue Aug 6 18:22:56 2002 No locked files # # now the smbd count will grow as windows retries to access the share... # This can be reproduced at any samba 2.2.5 site I guess. The problem is that the smbd blocked in the FIFO open maintain a lock on the locking.tdb that prevent other smbd to work properly (they block at start during 'lock table scan for cleanup' phase I presume). The fifo is only a trigger here, but I wonder if some network problems (socket reset, signals...) arriving at a specific moment can trigger the same kind of deadlocks... ? So , it could be very important to investigate what happens here, maybe it will solve lots of similar bugs. I'd like to hear from the technical developper team on this part of the code. Thank you, Pascal VMF Le Mardi 6 Ao?t 2002 17:54, Pascal a ?crit :> Investigating more last lite (in France :) ) I find in fact thefollowing:> > 1. One SMBD was in a read on a FIFO > Since this read is restartable (I tested it with a kill and beeing > under strace) this smbd will NEVER be terminated (the fifo was left on a > share by a very old process, and this smbd tried to read it, as a normal > processing of share backup...) > > 2. while in this state, I had lots of other smbd stucked in : > fcntl64(14, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=256,len=1}> <unfinished ...> > fd 14 is the locking.tdb database here (lsof proof) > > 3. I compiled tdbtool and tried to list the locking database > ==> tdbtool list or dump blocks each time at the same place in the > locking.tdb same ofset as the other smbd ! > > 4. I removed the FIFO (rm) > > 5. my first smbd then terminated properly > > 6. all the other blocked smbd terminated properly > > 7. smbstatus showed ghost locks (pid no more present) > > 8. smbclient -L localhost ... cleaned up the database as expected. > > I am in samba 2.2.5 ( -14mdk mandrake serial) > several times a week I have locks problem like that in the server and > nobody can access the share during this time. It's the first time Inotice> that locking.tdb is not scanable, I have to checkout next time too. > > I have started today a debug level 10 but to no avail - everything works > fine today :( > > maybe a race condition or timing problem in the lock code onlocking.tdb.> > I really wonder if it is possible for a smbd process to block othersmbds> only because it has a byte range lock on this database ? I know that at > start, new smbd scan this table to remove dead locks, can this scan be > blocked at some point in the samba code ? > > any clue would be apreciated. > > > Pascal > > Le Mardi 6 Ao?t 2002 17:02, David Collier-Brown a ?crit : > > Pascal wrote: > > > I've read your problem report in the samba-technical list and itseems> > > I am experiencing the same symptoms on some servers too. > > > > > > I noticed that the 'dead' smbd are in fact blocked because thenetwork> > > socket is still to be closed by the client side (connexion showed as > > > CLOSE_WAIT in netstat -an). > > > > Hmmn: I think this is another variant of a known > > Windows <censored> feature. Try adding > > dead time = 10 > > keepalive = 3600 > > to the conf file, which should clean up broken > > clients after 10 minutes of apparent death. > > > > If ten minutes is too long 9and it may well be), > > change just keepalive to 30 (seconds). This will > > cause samba to notice if a client goes silent for 30 > > seconds and send them an "are you alive" query. If > > they aren't alive samba will tear them down and release > > any datastructures and locks they're holding. > > > > This is usually a symptom of a network or client > > problem: I have a note about an earlier symptom at > > http://www.geocities.com/orville_torpid/papers/ethernet-note.txt > > > > Horst wrote: > > > The visible symptom is that users find themselves locked out offiles> > > by their own processes. On the server side we find that users have > > > multiple smbd's running that are talking to the same PC. One smbdwill> > > be the active one and others (older ones) will be blocked and willnot> > > respond to SIGTERM. Killing with other signals works, but leavesold> > > locks in the lock database. > > > > > > Below I've included a sample of four processes shown in smbstatus > > > output and the gdb backtrace of each of the four. Three of them are > > > blocked in read_data and not being responsive to the client PCanymore.> > > Clearly, as each of these processes blocked the client PC happily > > > started a new one. > > > > > > The smbd version is 2.2.5 on Linux, Redhat 7.2. > > > > > > I realize that everyone is working on more exciting aspects of Samba > > > but wonder if anybody more familiar with the util_sock.c routinesand> > > signalling would have ideas or hints for me to debug further. The > > > receive code gets twisty and I'm not sure I totally understand howit's> > > supposed to be working with keepalive packets and such. > > > > > > --Eric > > > > --dave
Reasonably Related Threads
- [PATCH] mm: Remove double faults once write a device pfn
- [PATCH] mm: Remove double faults once write a device pfn
- [PATCH] mm: Remove double faults once write a device pfn
- [PATCH] mm: Take a page reference when removing device exclusive entries
- [PATCH 10/22] memremap: add a migrate callback to struct dev_pagemap_ops