On Wed, 24 Feb 2016 03:17:35 +0000, Olly Betts <olly at survex.com> wrote:>On Mon, Feb 22, 2016 at 12:26:27PM +0100, Eric wrote: >> On Sun, 21 Feb 2016 22:33:22 +0000, Olly Betts <olly at survex.com> wrote: >>> On Sun, Feb 21, 2016 at 02:15:25PM +0100, Eric J wrote: >>>> I discovered, while trying to set up Tcl bindings for Notmuch >>>> (https://notmuchmail.org/), which uses Xapian, that flintlock was not >>>> being locked (I had lost updates). >>> >>> It seems to work for me, testing with this: >>> >>> package require Tcl 8 >>> package require xapian 1.0.0 >>> xapian::WritableDatabase db "tmp.db" $xapian::DB_CREATE_OR_OPEN >>> xapian::WritableDatabase db2 "tmp.db" $xapian::DB_CREATE_OR_OPEN >> >> >> eric at bruno [ ~ ]$ cat /proc/version >> Linux version 3.13.300 (root at bruno) (gcc version 4.8.2 (GCC) ) #2 SMP >> Tue Sep 16 21:01:43 BST 2014 >> eric at bruno [ ~ ]$ tclsh >> % info patchlevel >> 8.6.1 >> % package require Tcl 8 >> 8.6.1 >> % package require xapian 1.0.0 >> 1.2.18 > > I've tested with 1.2.18 and can't reproduce this with that version > either (is that also the version of xapian-core you're running? The > 1.2.18 above is the bindings version I think). > >> % xapian::WritableDatabase db "tmp.db" $xapian::DB_CREATE_OR_OPEN >> _e0c4b00000000000_p_Xapian__WritableDatabase >> % xapian::WritableDatabase db2 "tmp.db" $xapian::DB_CREATE_OR_OPEN >> _f0d3b00000000000_p_Xapian__WritableDatabase >> % >> >> At which point >> >> eric at bruno [ ~ ]$ lsof tmp.db/flintlock >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> cat 13543 eric 5w REG 8,9 0 930437 tmp.db/flintlock >> cat 13552 eric 9w REG 8,9 0 930437 tmp.db/flintlock >> >> Blaming the execl is due to stepping though my copy of the lock code in >> gdb, and seeing, in lsof, 5w on the open, still 5w on the fork, 5ww on >> the fcntl, and 5w again on the execl. > > Odd, as you said elsewhere, execl() shouldn't drop the lock. It would > be good to get to the bottom of this, as unreliable locking is a bad > thing to have. > > What FS are you running this on?ext4> Is use of Tcl actually a factor here, or can you reproduce it with > just C++ code? > > E.g. using the "simpleindex" example from the xapian-core sources: > > examples/simpleindex tmp.db & > examples/simpleindex tmp.dblfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db & [1] 26157 lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db DatabaseLockError: Unable to get write lock on tmp.db: already locked [1]+ Stopped examples/simpleindex tmp.db so it is presumably not anything to do with the FS or the OS. I am hoping that the right Tcl person (whoever that is) may pick something up in an strace.> More recent Xapian versions will try to use the new OFD locks and avoid > the need to fork() and execl(), so will presumably avoid whatever is > going on here. But the OFD locks were added in Linux 3.15, so your > kernel isn't quite new enough.Yes, I saw that, and it is good, but my chances of moving up soon are not good. And I would like to get to the bottom of this anyway. Thanx, Eric -- ms fnd in a lbry
Eric J <eric <at> deptj.eu> writes:> >> eric <at> bruno [ ~ ]$ tclsh > >> % info patchlevel > >> 8.6.1 > >> % package require Tcl 8 > >> 8.6.1 > >> % package require xapian 1.0.0 > >> 1.2.18Does the problem occur with both Tcl 8.6.1 and the 8.5 series?
On Wed, Feb 24, 2016 at 04:30:55PM +0100, Eric J wrote:> On Wed, 24 Feb 2016 03:17:35 +0000, Olly Betts <olly at survex.com> wrote: > >On Mon, Feb 22, 2016 at 12:26:27PM +0100, Eric wrote: > >> % package require xapian 1.0.0 > >> 1.2.18 > > > > I've tested with 1.2.18 and can't reproduce this with that version > > either (is that also the version of xapian-core you're running? The > > 1.2.18 above is the bindings version I think).You didn't answer this...> > What FS are you running this on? > > ext4Pretty standard then, and what I tested with.> > Is use of Tcl actually a factor here, or can you reproduce it with > > just C++ code? > > > > E.g. using the "simpleindex" example from the xapian-core sources: > > > > examples/simpleindex tmp.db & > > examples/simpleindex tmp.db > > lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db & > [1] 26157 > lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db > DatabaseLockError: Unable to get write lock on tmp.db: already locked > > [1]+ Stopped examples/simpleindex tmp.db > > so it is presumably not anything to do with the FS or the OS. I am > hoping that the right Tcl person (whoever that is) may pick something up > in an strace.It's clearly not as simple as execl() always releasing the lock, but I don't think we've ruled out the OS entirely yet - the above isn't exactly equivalent to the Tcl code, as the two databases are created by the same process in Tcl but different processes with simpleindex. Could you try this C++ version: #include <xapian.h> int main() { Xapian::WritableDatabase db("tmp.db", Xapian::DB_CREATE_OR_OPEN); Xapian::WritableDatabase db2("tmp.db", Xapian::DB_CREATE_OR_OPEN); } Compile with: g++ -O2 `xapian-config --cxxflags --libs` doubleopen.cc And then run: ./a.out If locking is working, this should fail (and does for me) like so: terminate called after throwing an instance of 'Xapian::DatabaseLockError' Aborted> > More recent Xapian versions will try to use the new OFD locks and avoid > > the need to fork() and execl(), so will presumably avoid whatever is > > going on here. But the OFD locks were added in Linux 3.15, so your > > kernel isn't quite new enough. > > Yes, I saw that, and it is good, but my chances of moving up soon are > not good. And I would like to get to the bottom of this anyway.Indeed - I was noting it more as something to be aware of when testing newer versions. Cheers, Olly Cheers, Olly
On Wed, 24 Feb 2016 23:17:38 +0000 (UTC), Eric Lindblad <GeirfuglApS at yahoo.com> wrote:> Eric J <eric <at> deptj.eu> writes: > >>>> eric <at> bruno [ ~ ]$ tclsh >>>> % info patchlevel >>>> 8.6.1 >>>> % package require Tcl 8 >>>> 8.6.1 >>>> % package require xapian 1.0.0 >>>> 1.2.18 > > Does the problem occur with both Tcl 8.6.1 and the 8.5 series? >Darn, I didn't do that because I don't have an 8.5 . However I got the 8.5.17 tclkit, and that works, where my real 8.6.1 install, an 8.6.1 tclkit, and an 8.6.3 tclkit do not work! All are loading the same .so file containing the test code! Thanx, Eric -- ms fnd in a lbry
On Thu, 25 Feb 2016 02:24:51 +0000, Olly Betts <olly at survex.com> wrote:> On Wed, Feb 24, 2016 at 04:30:55PM +0100, Eric J wrote: >> On Wed, 24 Feb 2016 03:17:35 +0000, Olly Betts <olly at survex.com> wrote: >>>On Mon, Feb 22, 2016 at 12:26:27PM +0100, Eric wrote: >>>> % package require xapian 1.0.0 >>>> 1.2.18 >>> >>> I've tested with 1.2.18 and can't reproduce this with that version >>> either (is that also the version of xapian-core you're running? The >>> 1.2.18 above is the bindings version I think). > > You didn't answer this...Sorry, core is 1.2.18 as well.>>> What FS are you running this on? >> >> ext4 > > Pretty standard then, and what I tested with. > >>> Is use of Tcl actually a factor here, or can you reproduce it with >>> just C++ code? >>> >>> E.g. using the "simpleindex" example from the xapian-core sources: >>> >>> examples/simpleindex tmp.db & >>> examples/simpleindex tmp.db >> >> lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db & >> [1] 26157 >> lfs at bruno [ /usr/src/sources-deptj/xapian-core-1.2.18 ]$ examples/simpleindex tmp.db >> DatabaseLockError: Unable to get write lock on tmp.db: already locked >> >> [1]+ Stopped examples/simpleindex tmp.db >> >> so it is presumably not anything to do with the FS or the OS. I am >> hoping that the right Tcl person (whoever that is) may pick something up >> in an strace. > > It's clearly not as simple as execl() always releasing the lock, but I > don't think we've ruled out the OS entirely yet - the above isn't > exactly equivalent to the Tcl code, as the two databases are created by > the same process in Tcl but different processes with simpleindex.but the same problem happens from two different Tcl processes - both succeed because there is no lock.> Could you try this C++ version: > > #include <xapian.h> > int main() { > Xapian::WritableDatabase db("tmp.db", Xapian::DB_CREATE_OR_OPEN); > Xapian::WritableDatabase db2("tmp.db", Xapian::DB_CREATE_OR_OPEN); > } > > Compile with: > > g++ -O2 `xapian-config --cxxflags --libs` doubleopen.cc > > And then run: > > ./a.out > > If locking is working, this should fail (and does for me) like so: > > terminate called after throwing an instance of 'Xapian::DatabaseLockError' > AbortedGot exactly that. Finally, it appears that it does work with Tcl 8.5 (actually a tclkit, but does not work with an 8.6 tclkit). Thanx, Eric -- ms fnd in a lbry