I have an intermittent problem with dangling MS Access DB lock files. In a productive environment with N batch queus (each on a separate Windows XP Professional) a scheduler PC dispatches the work load to a free queue by means of modifying a simple MS Access DB file called "PRIM.mdb", which resides on a Samba 3.0.20 share. Each free queue PC polls the same MS Access DB file every 60 seconds to see if there is a work packet to be executed by it. If there is a work package for it it modifies a state value of the respective work packet in this DB when it starts executing it as well as after the job has been done so that the scheduler knows what's going on. As I understand the MS Access API a client creates a lock file "PRIM.ldb" whenever it wishes to modify the DB file "PRIM.mdb".>From time to time, but generally not very often, a lock file isdropped by either a queue PC or by scheduler. Therefore no one can modify the DB file until the lock file is removed. The linux utility "lsof" can't see that the lock file is opened by any process but the status page of SWAT always lists it under "Open Files" with the smbd PID of the client which last opened it, with "Sharing" declared "DENY_NONE", "R/W" as "RDWR", "Oplock" is "NONE" (oplocks=no), full path name under "File" and the timestamp of the last access under "Date". The lock file can only be removed by super user root but if the smbd process which holds it open is killed, then the problem is also solved sometimes even without removing the lock file itself. I have inspected the samba log files and identified the point at which this happens recently. To make things simpler I've removed much of the samba verbosity in the following overview of the offending client's actions (mdb is the MS Access DB file and ldb is the respective lock file): 23:55:53 close mdb 1 close ldb 0 23:55:54 open ldb ro 1 open mdb rw 2 open ldb rw 3 23:55:55 close mdb 1 Share violation on PRIM.mdb, flags=2 open mdb rw 2 No other client was active at the time so there is no racing here. Only the offending client loses track of the lock file and forgets about it. Has anyone had a similar problem and lived to talk about it? Any hints?
On Mon, Oct 10, 2005 at 04:21:15PM +0100, Dragan Krnic wrote:> I have an intermittent problem with dangling MS Access DB lock files. > > In a productive environment with N batch queus (each on a separate > Windows XP Professional) a scheduler PC dispatches the work load to > a free queue by means of modifying a simple MS Access DB file > called "PRIM.mdb", which resides on a Samba 3.0.20 share. Each free > queue PC polls the same MS Access DB file every 60 seconds to see > if there is a work packet to be executed by it. If there is a work > package for it it modifies a state value of the respective work packet > in this DB when it starts executing it as well as after the job has > been done so that the scheduler knows what's going on.There's a bug in 3.0.20 that might affect this (btw it's also in 3.0.20a). I know about it because it's my fault :-(. Here's the patch for 3.0.20, and 3.0.20a. Jeremy. -------------- next part -------------- --- smbd/open.c 2005-07-28 06:19:42.000000000 -0700 +++ smbd/open.c.new 2005-10-10 09:32:25.000000000 -0700 @@ -1585,13 +1585,6 @@ fsp_open = open_file(fsp,conn,fname,psbuf,flags|flags2,unx_mode,access_mask); - if (!fsp_open && (flags == O_RDWR) && (errno != ENOENT)) { - if((fsp_open = open_file(fsp,conn,fname,psbuf, - O_RDONLY,unx_mode,access_mask)) == True) { - flags = O_RDONLY; - } - } - if (!fsp_open) { if(file_existed) { unlock_share_entry(conn, dev, inode); -------------- next part -------------- --- smbd/open.c 2005-09-29 14:52:40.000000000 -0700 +++ smbd/open.c.new 2005-10-06 21:45:37.000000000 -0700 @@ -1585,22 +1585,6 @@ fsp_open = open_file(fsp,conn,fname,psbuf,flags|flags2,unx_mode,access_mask); - if (!fsp_open && (flags2 & O_EXCL) && (errno == EEXIST)) { - /* - * Two smbd's tried to open exclusively, but only one of them - * succeeded. - */ - file_free(fsp); - return NULL; - } - - if (!fsp_open && (flags == O_RDWR) && (errno != ENOENT)) { - if((fsp_open = open_file(fsp,conn,fname,psbuf, - O_RDONLY,unx_mode,access_mask)) == True) { - flags = O_RDONLY; - } - } - if (!fsp_open) { if(file_existed) { unlock_share_entry(conn, dev, inode);
>> I have an intermittent problem with dangling MS Access DB lock files. >> >> In a productive environment with N batch queus (each on a separate >> Windows XP Professional) a scheduler PC dispatches the work load to >> a free queue by means of modifying a simple MS Access DB file called >> "PRIM.mdb", which resides on a Samba 3.0.20 share. Each free >> queue PC polls the same MS Access DB file every 60 seconds to see >> if there is a work packet to be executed by it. If there is a work >> package for it it modifies a state value of the respective work packet >> in this DB when it starts executing it as well as after the job has >> been done so that the scheduler knows what's going on. > > There's a bug in 3.0.20 that might affect this (btw it's also in > 3.0.20a). I know about it because it's my fault :-(. > > Here's the patch for 3.0.20, and 3.0.20a. > > Jeremy.Thank you, Jeremy. I might have unwittingly made the impression that the problem only came with 3.0.20. The same problem was present in 3.0.14 as well as 3.0.4. It's a very intermittent problem which has been haunting me for months now. I have compiled your patches and installed it on the affected samba server and the first obvious difference is that when either the *.mdb or *.ldb file is opened then the "Sharing" attribute in the "Open Files" section of the status page is now "DENY_DOS" instead of "DENY_NONE". (I can catch the moment when they're opend if I keep refreshing the status often enough.) I hope that no lock file will dangle any more. I'll keep you posted. Best regards Dragan
> Where is this patch, I would like to update my server to 3.0.20a, > but if there are some problem with Access DB Lock Files, I prefer > to patche samba before compiling.You can find the patches for 3.0.20 and 3.0.20a in Jeremy's first answer to my question, but for your convenience here they are: For 3.0.20 --- smbd/open.c 2005-07-28 06:19:42.000000000 -0700 +++ smbd/open.c.new 2005-10-10 09:32:25.000000000 -0700 @@ -1585,13 +1585,6 @@ fsp_open = open_file(fsp,conn,fname,psbuf,flags|flags2,unx_mode,access_mask); - if (!fsp_open && (flags == O_RDWR) && (errno != ENOENT)) { - if((fsp_open = open_file(fsp,conn,fname,psbuf, - O_RDONLY,unx_mode,access_mask)) == True) { - flags = O_RDONLY; - } - } - if (!fsp_open) { if(file_existed) { unlock_share_entry(conn, dev, inode); For 3.0.20a --- smbd/open.c 2005-09-29 14:52:40.000000000 -0700 +++ smbd/open.c.new 2005-10-06 21:45:37.000000000 -0700 @@ -1585,22 +1585,6 @@ fsp_open = open_file(fsp,conn,fname,psbuf,flags|flags2,unx_mode,access_mask); - if (!fsp_open && (flags2 & O_EXCL) && (errno == EEXIST)) { - /* - * Two smbd's tried to open exclusively, but only one of them - * succeeded. - */ - file_free(fsp); - return NULL; - } - - if (!fsp_open && (flags == O_RDWR) && (errno != ENOENT)) { - if((fsp_open = open_file(fsp,conn,fname,psbuf, - O_RDONLY,unx_mode,access_mask)) == True) { - flags = O_RDONLY; - } - } - if (!fsp_open) { if(file_existed) { unlock_share_entry(conn, dev, inode);
>> I might have unwittingly made the impression that the problem only came with 3.0.20. >> The same problem was present in 3.0.14 as well as 3.0.4. It's a very intermittent >> problem which has been haunting me for months now. > > Then it's not this particular bug.No, it ain't. There's a dangling *.ldb file there write now. But now there are 2 PIDs listed as holding the *.MDB file open with "DENY_DOS" and "RDWR" sharing both with the same timestamp "Fri Oct 14 19:58:09 2005", whereas formerly it used to be only one open of *.mdb and one of *.ldb file. The *.ldb file was opened "RDONLY" about 10 minutes earlier by one of the 2 contestants. Can I do some more forensic on the logs?
>>>> I might have unwittingly made the impression that the problem only came with 3.0.20. >>>> The same problem was present in 3.0.14 as well as 3.0.4. It's a very intermittent >>>> problem which has been haunting me for months now. >>> Then it's not this particular bug. >> >> No, it ain't. There's a dangling *.ldb file there write now. But now there are 2 PIDs >> listed as holding the *.MDB file open with "DENY_DOS" and "RDWR" sharing both with the >> same timestamp "Fri Oct 14 19:58:09 2005", whereas formerly it used to be only one open >> of *.mdb and one of *.ldb file. The *.ldb file was opened "RDONLY" about 10 minutes >> earlier by one of the 2 contestants. Can I do some more forensic on the logs? > > Ok, if you can reproduce this bug with 3.0.20b then refresh me with the problem > and then let's look at it closer.Yes. Of course. I've just compiled and installed 3.0.20b and set a watchdog to observe it. I was thinking of adding a repair action to the watchdog script that would identify the smbd PID keeping the *.ldb MS Access DB lock file open for so long and kill it, but the command "net rpc file" never lists what I can easily see on the status page of SWAT if I continuously refresh it until by chance one of the clients opens it. The "net rpc file only lists "0 \PIPE\samr 0x35 0 dummy user all the time. Is there another CLI utility which lists the same thing as SWAT does? Regards Dragan PS.: I've severely edited the verbosity of a samba log for the previous incident, so that the MS Access rain-dance can easily be followed to the point where the offending client for some reason reopens the lock file after once successfully opening it with numopen=1, meaning that nobody else claims it until he reopens it, when numopen increments to =2 for obvious readons, and then never remembers to close it the same number of times. It might be one of those MS tricks to scare the users off Samba? Would you like to take a look at it, if I send it to you off-samba-list?
>> I was thinking of adding a repair action to the watchdog script >> that would identify the smbd PID keeping the *.ldb MS Access DB >> lock file open for so long and kill it, but the command "net rpc >> file" never lists what I can easily see on the status page of SWAT >> if I continuously refresh it until by chance one of the clients >> opens it. The "net rpc file" only lists >> >> 0 \PIPE\samr 0x35 0 dummy user >> >> all the time. Is there another CLI utility which lists the same >> thing as SWAT does? > > smbstatus list all connection and open fileMuch obliged, Stephane.