Ray Van Dolson
2009-Oct-23 21:18 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
(Yes, I should upgrade Samba to 3.0.35). We're running the Sun provided Samba daemon (SUNWsmbau and friends) on Solaris 10 Generic_138888-08 (sparc). Lots of Windows clients (mixed XP, 2003, 2008) hit this server and periodically we'll start seeing smbd processes begin piling up. These processes can't be killed with a normal kill -- only kill -9 will do the trick. In the past I've been working with the owners of these Windows machines to ensure scripts they use that hit our shares are written correctly. However, I started peeking at a lot of these smbd proceses and it seems like something is amiss perhaps on the Samba side. Here's the pertinent info on a randomly selected "hung" process: # truss -v all -aef -p 2506767 25067: *** SUID: ruid/euid/suid = 0 / 122 / 122 *** 25067: *** SGID: rgid/egid/sgid = 0 / 9 / 9 *** 25067: psargs: /usr/sfw/sbin/smbd -D 25067: fcntl(10, F_SETLKW64, 0xFFBFF6F8) (sleeping...) 25067: typ=F_WRLCK whence=SEEK_SET start=32412 len=1 sys=4245464 pid=0 What's FD 10 you ask? # pfiles -F 25067 10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680 O_RDWR|O_LARGEFILE advisory read lock set by process 21130 /var/samba/locks/brlock.tdb At this point, cued by another post on this list, I tried a tdbdump on /var/samba/locks/brlock.tdb. It completed without issue however. pstack output: # pstack -F 25067 25067: /usr/sfw/sbin/smbd -D ff049c64 fcntl (a, 23, ffbff6f8) ff0398c0 fcntl (a, 23, ffbff6f8, 7e9c, fee02a00, 18a564) + 18 002822e8 tdb_brlock (4c18e0, 7e9c, 2, 23, 0, 1) + 90 002825f0 tdb_lock (4c18e0, 1f7d, 2, 0, 20, 0) + 16c 0020982c ???????? (0, 6833f8, 1, 5cb1d0, 5cb1e0, 40c7d8) 00202d18 is_locked (6833f8, feff, 0, 40c7d8, 0, 0) + 280 00091820 reply_read_and_X (6ded80, 6be900, 3f, 6833f8, 20000, 7) + 2d4 000d35ec ???????? (6be900, 69e4b0, 6be900, 3f, 20000, 8e94) 000d3728 ???????? (9400, 6be900, 3f, 20000, 9400, 0) 000d399c ???????? (69e4b0, 6be900, 4134a0, 6cc8, 40c7d8, 6c00) 000d4b78 smbd_process (6800, 40c7d8, 93a80, 20441, d, 0) + 1ec 00338f38 main (0, 43e110, 0, 41566c, 4175d4, 1) + 9cc 0004e118 _start (0, 0, 0, 0, 0, 0) + 108 The truss shows me that the signals are being received, but in all cases, the process goes back to the SETLKW64 call. /var/samba/locks is on a normal UFS filesystem. Now, clearly there are some patches that could be applied to this system, and I can upgrade Samba to 3.0.35, but I'm hoping someone out there will have an idea of what might be going on here. Why would this particular smbd process *not* be able to get a lock on the brlock.tdb file at a certain point, but subsequent smbd processes apparrently are (new connections to the server appear to be working OK)? And why wouldn't the SETLKW64 command eventually succeed? Would like to get this one figured out instead of just manually killing all the processes every couple weeks or so. Thanks much :) Ray
Volker Lendecke
2009-Oct-23 21:37 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote:> (Yes, I should upgrade Samba to 3.0.35). > > We're running the Sun provided Samba daemon (SUNWsmbau and friends) on > Solaris 10 Generic_138888-08 (sparc). > > Lots of Windows clients (mixed XP, 2003, 2008) hit this server and > periodically we'll start seeing smbd processes begin piling up. These > processes can't be killed with a normal kill -- only kill -9 will do > the trick.Probably someone else is holding the same lock for some reason and is stuck in a file system syscall. Under Linux you would look at /proc/locks to find that info, no idea how to find the current lock holder under Solaris. You need to find that one and see what syscall that guy is stuck in. BTW, you don't happen to run something like samfs? Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20091023/d524dd96/attachment.pgp>
Volker Lendecke
2009-Oct-23 21:41 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote:> # pfiles -F 25067 > 10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680 > O_RDWR|O_LARGEFILE > advisory read lock set by process 21130 > /var/samba/locks/brlock.tdbAhhh. What does process 21130 do right now? Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20091023/07095de8/attachment.pgp>
Ray Van Dolson
2009-Oct-23 21:56 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:41:46PM -0700, Volker Lendecke wrote:> On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote: > > # pfiles -F 25067 > > 10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680 > > O_RDWR|O_LARGEFILE > > advisory read lock set by process 21130 > > /var/samba/locks/brlock.tdb > > Ahhh. What does process 21130 do right now? >That is (was) the PID of the parent smbd process -- the one that spawns all others. Ray