Matthias Merz
2007-Jun-01 21:45 UTC
[Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?
Hi folks, Today I noticed some strange behaviour when accessing a samba server (samba 3.0.25a) from windows: On our Debian fileserver I prepared a file testfile.txt being owned by user usera and group dpt-a. Then I "setfacl -m g:admins:rwx testfile.txt". User userb who is only in group admins, but not in dpt-a is thus permitted to access and change this file by its POSIX-ACL, which works flawlessly from linux. $ getfacl testfile.txt # file: testfile.txt # owner: usera # group: dpt-a user::rwx group::r-- group:admins:rwx mask::rwx other::r-- Then I did some changes to that file from a windows machine via notepad.exe and noticed, that notepad seemed to "succeed" in saving, but the changes were *not* written to that file! Very strange IMHO. So I did some more digging with strace, since I didn't find a clue in the logs. "strace -e open,close,write -f smbd -D" yielded: [pid 17704] open("foo/testfile.txt", O_RDWR|O_CREAT|O_NOFOLLOW, 0744) = 29 [some write()s to FD 24] [pid 17704] open("foo/testfile.txt", O_WRONLY|O_NOFOLLOW) = -1 EAGAIN (Resource temporarily unavailable) [pid 17704] --- SIGIO (I/O possible) @ 0 (0) --- [pid 17704] +++ killed by SIGIO +++ [pid 17478] --- SIGCHLD (Child exited) @ 0 (0) --- So this seemed to "explain" notepad thinking the file was saved successfully when I assume the SMB-protocol to not do "hard checks" for successful writes. Since the child serving my windows-access was killed, no error-message was probably be sent out. When googling for SIGIO and samba, I noticed some google-hits talking about oplocks, so I just tried disabling kernel oplocks in smb.conf: "kernel oplocks = no". This did the trick, after restarting samba, the writes were successsful again. Since the manpage states I would want oplocks (and I do *g*), I enabled them again and tried debugging using gdb (to provide the samba-team with a more detailed report). As I don't really know gdb, I failed in the first attempt because of samba forking multiple processes which were not "caught" by my gdb call (but the error occurred). So as weekend was approaching, I did'nt dig further into gdb, but read the manpage for smbd and started "gdb /usr/sbin/smbd -F -i". When trying to reproduce the error, I failed. I could reproduce this change even without gdb: "smbd -F -i -d 5" started from the shell did the writes, whereas "normal" smbd (smbd -F) failed to write the changes. One wild guess: maybe oplocks can only be done by the file owner / group owner and the samba-process crashes because of such a thing? Is there a difference in privilege-handling between "smbd -F" and "smbd -F -i" that could explain this? I'd assume this to be a samba bug, because I could reproduce this both with a not-so-recent linux-2.6 i386 and with a more recent linux-2.6 amd64. I can provide more debugging output etc. at the earliest on monday; sorry I forgot taking a log of a "full" strace-call as well as writing down the exact kernel versions which would of course have been very useful for you. Thanks for your replies and any help in solving this issue, Yours Matthias Merz -- Beware of bugs in the above code; I have only proved it correct, not tried it. (Donald E. Knuth)
Jeremy Allison
2007-Jun-01 22:56 UTC
[Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?
On Fri, Jun 01, 2007 at 11:44:29PM +0200, Matthias Merz wrote:> Hi folks, > > Today I noticed some strange behaviour when accessing a samba server > (samba 3.0.25a) from windows: On our Debian fileserver I prepared a > file testfile.txt being owned by user usera and group dpt-a. Then I > "setfacl -m g:admins:rwx testfile.txt". User userb who is only in > group admins, but not in dpt-a is thus permitted to access and change > this file by its POSIX-ACL, which works flawlessly from linux. > > $ getfacl testfile.txt > # file: testfile.txt > # owner: usera > # group: dpt-a > user::rwx > group::r-- > group:admins:rwx > mask::rwx > other::r-- > > > Then I did some changes to that file from a windows machine via > notepad.exe and noticed, that notepad seemed to "succeed" in saving, > but the changes were *not* written to that file! Very strange IMHO. > > > So I did some more digging with strace, since I didn't find a clue in > the logs. > > "strace -e open,close,write -f smbd -D" yielded: > [pid 17704] open("foo/testfile.txt", O_RDWR|O_CREAT|O_NOFOLLOW, 0744) = 29 > [some write()s to FD 24] > [pid 17704] open("foo/testfile.txt", O_WRONLY|O_NOFOLLOW) = -1 EAGAIN (Resource temporarily unavailable) > [pid 17704] --- SIGIO (I/O possible) @ 0 (0) --- > [pid 17704] +++ killed by SIGIO +++ > [pid 17478] --- SIGCHLD (Child exited) @ 0 (0) ---This actually looks like an old kernel bug that has been fixed - sorry I can't remember the version. The kernel shouldn't be sending a SIGIO for an oplock break, it should be sending a POSIX RT signal #define RT_SIGNAL_LEASE (SIGRTMIN+1) in the Samba source. I recall this as a kernel bug that got fixed a few months or so ago. This isn't a Samba bug IMHO. Jeremy.
Reasonably Related Threads
- [PATCH 0/3] virtio: console: async notifications for host connect / disconnect
- [PATCH 0/3] virtio: console: async notifications for host connect / disconnect
- Excel and Samba Problem
- [PATCH 0/4] virtio: console: fixes, SIGIO
- [PATCH 0/4] virtio: console: fixes, SIGIO