On Thu, 25 Feb 2016 02:24:51 +0000, Olly Betts <olly at survex.com> wrote:> On Wed, Feb 24, 2016 at 04:30:55PM +0100, Eric J wrote: >> On Wed, 24 Feb 2016 03:17:35 +0000, Olly Betts <olly at survex.com> wrote: >>>On Mon, Feb 22, 2016 at 12:26:27PM +0100, Eric wrote: >>>> % package require xapian 1.0.0 >>>> 1.2.18 >>> >>> I've tested with 1.2.18 and can't reproduce this with that version >>> either (is that also the version of xapian-core you're running? The >>> 1.2.18 above is the bindings version I think). > > You didn't answer this...Sorry, core is 1.2.18 as well.>>> What FS are you running this on? >> >> ext4 > > Pretty standard then, and what I tested with. > >>> Is use of Tcl actually a factor here, or can you reproduce it with >>> just C++ code? >>> >>> E.g. using the "simpleindex" example from the xapian-core sources: >>> >>> examples/simpleindex tmp.db & >>> examples/simpleindex tmp.db >> >> lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db & >> [1] 26157 >> lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db >> DatabaseLockError: Unable to get write lock on tmp.db: already locked >> >> [1]+ Stopped examples/simpleindex tmp.db >> >> so it is presumably not anything to do with the FS or the OS. I am >> hoping that the right Tcl person (whoever that is) may pick something up >> in an strace. > > It's clearly not as simple as execl() always releasing the lock, but I > don't think we've ruled out the OS entirely yet - the above isn't > exactly equivalent to the Tcl code, as the two databases are created by > the same process in Tcl but different processes with simpleindex.but the same problem happens from two different Tcl processes - both succeed because there is no lock.> Could you try this C++ version: > > #include <xapian.h> > int main() { > Xapian::WritableDatabase db("tmp.db", Xapian::DB_CREATE_OR_OPEN); > Xapian::WritableDatabase db2("tmp.db", Xapian::DB_CREATE_OR_OPEN); > } > > Compile with: > > g++ -O2 `xapian-config --cxxflags --libs` doubleopen.cc > > And then run: > > ./a.out > > If locking is working, this should fail (and does for me) like so: > > terminate called after throwing an instance of 'Xapian::DatabaseLockError' > AbortedGot exactly that. Finally, it appears that it does work with Tcl 8.5 (actually a tclkit, but does not work with an 8.6 tclkit). Thanx, Eric -- ms fnd in a lbry
On Thu, Feb 25, 2016 at 05:21:17PM +0100, Eric J wrote:> On Thu, 25 Feb 2016 02:24:51 +0000, Olly Betts <olly at survex.com> wrote: > > It's clearly not as simple as execl() always releasing the lock, but I > > don't think we've ruled out the OS entirely yet - the above isn't > > exactly equivalent to the Tcl code, as the two databases are created by > > the same process in Tcl but different processes with simpleindex. > > but the same problem happens from two different Tcl processes - both > succeed because there is no lock.Ah, OK - I missed that detail.> Finally, it appears that it does work with Tcl 8.5 (actually a tclkit, > but does not work with an 8.6 tclkit).I'm testing with Tcl 8.6 (Debian package 8.6.4+dfsg-3), and it works for me. So it does seem it must be due to something your Tcl interpreter is doing, but I'm struggling to think what that could be. If O_CLOEXEC was set on the lock fd when execl() was called, the fd would get closed and the lock released. But your lsof shows the fd open but not locked in the child process after it has exec-ed cat. If there were a second fd open on the lock file which gets closed in the child process after the lock is taken, that would release the lock. But we carefully close all other open fds before taking the lock to avoid that. Cheers, Olly
On Thu, 25 Feb 2016 23:37:52 +0000, Olly Betts <olly at survex.com> wrote:> On Thu, Feb 25, 2016 at 05:21:17PM +0100, Eric J wrote: > > On Thu, 25 Feb 2016 02:24:51 +0000, Olly Betts <olly at survex.com> wrote: > > > It's clearly not as simple as execl() always releasing the lock, but I > > > don't think we've ruled out the OS entirely yet - the above isn't > > > exactly equivalent to the Tcl code, as the two databases are created by > > > the same process in Tcl but different processes with simpleindex. > > > > but the same problem happens from two different Tcl processes - both > > succeed because there is no lock. > > Ah, OK - I missed that detail. > > > Finally, it appears that it does work with Tcl 8.5 (actually a tclkit, > > but does not work with an 8.6 tclkit). > > I'm testing with Tcl 8.6 (Debian package 8.6.4+dfsg-3), and it works for > me. > > So it does seem it must be due to something your Tcl interpreter is > doing, but I'm struggling to think what that could be. > > If O_CLOEXEC was set on the lock fd when execl() was called, the fd > would get closed and the lock released. But your lsof shows the fd open > but not locked in the child process after it has exec-ed cat. > > If there were a second fd open on the lock file which gets closed > in the child process after the lock is taken, that would release the > lock. But we carefully close all other open fds before taking the > lock to avoid that.I have tried Tcl 8.6.4 now, and it too has the problem. However with the very new Tcl 8.6.5rc2 it works! I still intend to try to find out what the problem was, but I can use the 8.5 tclkit for what I was doing when this all started, and then move to 8.6.5 when it becomes a real release. Thanx very much, Eric -- ms fnd in a lbry