Ray Van Dolson
2009-Oct-23 21:18 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
(Yes, I should upgrade Samba to 3.0.35).
We're running the Sun provided Samba daemon (SUNWsmbau and friends) on
Solaris 10 Generic_138888-08 (sparc).
Lots of Windows clients (mixed XP, 2003, 2008) hit this server and
periodically we'll start seeing smbd processes begin piling up. These
processes can't be killed with a normal kill -- only kill -9 will do
the trick.
In the past I've been working with the owners of these Windows machines
to ensure scripts they use that hit our shares are written correctly.
However, I started peeking at a lot of these smbd proceses and it seems
like something is amiss perhaps on the Samba side.
Here's the pertinent info on a randomly selected "hung" process:
# truss -v all -aef -p 2506767
25067: *** SUID: ruid/euid/suid = 0 / 122 / 122 ***
25067: *** SGID: rgid/egid/sgid = 0 / 9 / 9 ***
25067: psargs: /usr/sfw/sbin/smbd -D
25067: fcntl(10, F_SETLKW64, 0xFFBFF6F8) (sleeping...)
25067: typ=F_WRLCK whence=SEEK_SET start=32412 len=1 sys=4245464
pid=0
What's FD 10 you ask?
# pfiles -F 25067
10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680
O_RDWR|O_LARGEFILE
advisory read lock set by process 21130
/var/samba/locks/brlock.tdb
At this point, cued by another post on this list, I tried a tdbdump on
/var/samba/locks/brlock.tdb. It completed without issue however.
pstack output:
# pstack -F 25067
25067: /usr/sfw/sbin/smbd -D
ff049c64 fcntl (a, 23, ffbff6f8)
ff0398c0 fcntl (a, 23, ffbff6f8, 7e9c, fee02a00, 18a564) + 18
002822e8 tdb_brlock (4c18e0, 7e9c, 2, 23, 0, 1) + 90
002825f0 tdb_lock (4c18e0, 1f7d, 2, 0, 20, 0) + 16c
0020982c ???????? (0, 6833f8, 1, 5cb1d0, 5cb1e0, 40c7d8)
00202d18 is_locked (6833f8, feff, 0, 40c7d8, 0, 0) + 280
00091820 reply_read_and_X (6ded80, 6be900, 3f, 6833f8, 20000, 7) + 2d4
000d35ec ???????? (6be900, 69e4b0, 6be900, 3f, 20000, 8e94)
000d3728 ???????? (9400, 6be900, 3f, 20000, 9400, 0)
000d399c ???????? (69e4b0, 6be900, 4134a0, 6cc8, 40c7d8, 6c00)
000d4b78 smbd_process (6800, 40c7d8, 93a80, 20441, d, 0) + 1ec
00338f38 main (0, 43e110, 0, 41566c, 4175d4, 1) + 9cc
0004e118 _start (0, 0, 0, 0, 0, 0) + 108
The truss shows me that the signals are being received, but in all
cases, the process goes back to the SETLKW64 call.
/var/samba/locks is on a normal UFS filesystem.
Now, clearly there are some patches that could be applied to this
system, and I can upgrade Samba to 3.0.35, but I'm hoping someone out
there will have an idea of what might be going on here. Why would this
particular smbd process *not* be able to get a lock on the brlock.tdb
file at a certain point, but subsequent smbd processes apparrently are
(new connections to the server appear to be working OK)? And why
wouldn't the SETLKW64 command eventually succeed?
Would like to get this one figured out instead of just manually killing
all the processes every couple weeks or so.
Thanks much :)
Ray
Volker Lendecke
2009-Oct-23 21:37 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote:> (Yes, I should upgrade Samba to 3.0.35). > > We're running the Sun provided Samba daemon (SUNWsmbau and friends) on > Solaris 10 Generic_138888-08 (sparc). > > Lots of Windows clients (mixed XP, 2003, 2008) hit this server and > periodically we'll start seeing smbd processes begin piling up. These > processes can't be killed with a normal kill -- only kill -9 will do > the trick.Probably someone else is holding the same lock for some reason and is stuck in a file system syscall. Under Linux you would look at /proc/locks to find that info, no idea how to find the current lock holder under Solaris. You need to find that one and see what syscall that guy is stuck in. BTW, you don't happen to run something like samfs? Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20091023/d524dd96/attachment.pgp>
Volker Lendecke
2009-Oct-23 21:41 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote:> # pfiles -F 25067 > 10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680 > O_RDWR|O_LARGEFILE > advisory read lock set by process 21130 > /var/samba/locks/brlock.tdbAhhh. What does process 21130 do right now? Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20091023/07095de8/attachment.pgp>
Ray Van Dolson
2009-Oct-23 21:56 UTC
[Samba] tdb locking issue - Solaris 10 and Samba 3.0.33
On Fri, Oct 23, 2009 at 02:41:46PM -0700, Volker Lendecke wrote:> On Fri, Oct 23, 2009 at 02:18:19PM -0700, Ray Van Dolson wrote: > > # pfiles -F 25067 > > 10: S_IFREG mode:0644 dev:85,60 ino:4630 uid:0 gid:0 size:327680 > > O_RDWR|O_LARGEFILE > > advisory read lock set by process 21130 > > /var/samba/locks/brlock.tdb > > Ahhh. What does process 21130 do right now? >That is (was) the PID of the parent smbd process -- the one that spawns all others. Ray