I am running some custom index code. I have a process that all other processes communicate with to insert documents (and other update functions such as delete, but for right now just inserts). I index and hand it over to termGenerator and all the other stuff to add a document. This works. However, it runs really slow (a document every several seconds or so, input document size about 2k-40k). When I do a "ps -ef" command from the command line I see a task belonging to my daemon that shows the command being run as "/bin/cat". Looking in the xapian source code I have found that to be in the flint backend locking code. Since I am serializing my updates (one after another) and only from a single process, why am I seeing what appears to be long-term locks? This index code ran very fast in pre-1.0 versions of the indexer. I upgraded to 1.0.0, then 1.0.1, etc. But I didn't need to index until recently. There are currently only 10,000 documents in the database. Thanks, -Michael Lewis
On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis wrote: [flint indexing processing apparently blocking on /bin/cat]> Since I am serializing my updates (one after another) and only from > a single process, why am I seeing what appears to be long-term > locks?Which OS? We had problems on Mac OS X which I don't remember ever actually getting to the bottom of, that involved its sitting on /bin/cat. (I noticed it while running the test suite.) IIRM that was using the remotetcp backend, with flint behind it; the tests were failing to get the remote backend up into listening state, because the locking was getting stuck. AFAIK we haven't seen the same thing on other operating systems (Mac OS X still fails these tests in HEAD for me, and there has been discussion of writing our own /bin/cat replacement in an effort, amongst other things, to fix this). J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis wrote:> When I do a "ps -ef" command from the command line I see a task > belonging to my daemon that shows the command being run as "/bin/cat". > Looking in the xapian source code I have found that to be in the flint > backend locking code.The semantics of fcntl() locking within a process are rather unhelpful, so we fork a child process to take and hold the lock for us. To minimise VM use, we just exec /bin/cat once the lock is obtained.> Since I am serializing my updates (one after another) and only from a > single process, why am I seeing what appears to be long-term locks?The lock is held (and so the /bin/cat child process exists) for as long as you have the WritableDatabase open. So unless you're closing and reopening the database for each addition (which generally is probably not a good idea) then this sounds like what I'd expect.> This index code ran very fast in pre-1.0 versions of the indexer. I > upgraded to 1.0.0, then 1.0.1, etc. But I didn't need to index until > recently.It's hard to know what's going on from the information given. You said you're using TermGenerator, which is new in 1.0.0, so that may be indexing significantly differently to whatever you were using before. Though several seconds per document for a 10,000 document database really is excessively slow anyway. Could you show us what the indexing code looks like? Cheers, Olly