Matei Pavel
2009-Oct-21 15:24 UTC
[Xapian-discuss] get_document loop really slow after index update (PHP bindings)
Hi, (sorry if this email gets out more than once - Yahoo keeps putting it back into my drafts folder and I haven't received in the daily digest either) I posted this question before but since my other email sometimes goes into spam folders and my latest updates to the issue got no replies, I'm re-posting with all details here: I'm using the PHP bindings to do a search on about 30-40.000 indexed documents that returnds 500-1000 results. The actual search takes under 200 miliseconds but the mset iterator loop (that calls get_document) takes anywhere from 5 to 21 seconds. This only happens for searches made after the index has had some updates. step 1: search for "word". loop takes 5+ seconds step 2: search for "word". loop takes under 200 miliseconds step 3: add document to index step 4: search for "word". loop takes 5+ seconds. Here is the oprofile callgraph:?promotii-reviste.ro/oprofile-callgraph.txt Here is the PHP code I'm using:?promotii-reviste.ro/php-code.txt Can anyone shed some light as to why this is happening? Thank you. Matt
James Aylett
2009-Oct-21 17:52 UTC
[Xapian-discuss] get_document loop really slow after index update (PHP bindings)
On Wed, Oct 21, 2009 at 06:24:04PM +0300, Matei Pavel wrote:> I'm using the PHP bindings to do a search on about 30-40.000 indexed > documents that returnds 500-1000 results. The actual search takes > under 200 miliseconds but the mset iterator loop (that calls > get_document) takes anywhere from 5 to 21 seconds. This only happens > for searches made after the index has had some updates.I'm guessing that on update you're losing enough of the Xapian database from cache and it needs to be reloaded; and that additionally something is killing your disk performance. You may be able to probe more with iostat and vmstat.> Here is the oprofile callgraph:?promotii-reviste.ro/oprofile-callgraph.txtI'm not great at reading oprofile, but time spent in the Xapian library itself is around 25k samples. As a comparison, zend_hash_find seems to have recorded 60k samples. I may be completely misreading this. (And then there's calls from Xapian down into other libraries, which if I'm getting things right is perhaps another 30k samples. J -- James Aylett talktorex.co.uk - xapian.org - uncertaintydivision.org