thr3ads.net - Xapian discuss - [Xapian-discuss] Tuning Phrase Searching [Nov 2005]

If this information is useful, please help other people find it:
Share via:

tech@dbx.co.uk

2005-Nov-03 12:32 UTC

[Xapian-discuss] Tuning Phrase Searching

Is there anything that can be done to speed up phrase searching? It is
currently a show stopper for our CV search system with queries for common
terms taking several minutes to execute. Simply ANDing the terms together
will return in 1-3 seconds.

I keep thinking that I must be missing something in either the way I index
or the way I (or rather the QueryParser) constructs the queries.

Thanks

Jeremy

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .

Olly Betts

2005-Nov-05 20:18 UTC

head link

[Xapian-discuss] Tuning Phrase Searching

On Thu, Nov 03, 2005 at 07:32:09AM -0500, tech@dbx.co.uk
wrote:> Is there anything that can be done to speed up phrase searching? It is
> currently a show stopper for our CV search system with queries for common
> terms taking several minutes to execute. Simply ANDing the terms together
> will return in 1-3 seconds.
Even 1-3 seconds for an AND query sounds rather slow.  More RAM could
well help a lot.  What spec is the machine?

And what do slow queries look like?

The flint backend is already noticeably faster for phrase searches (and
other searches) than quartz, so you could try that.  It's "in
development", but the aim is to have it stable in each release, but
with an evolving database format without worrying about compatibility.
The version of flint in 0.9.2 is in use by at least 3 large installations.

Just set environment variable XAPIAN_PREFER_FLINT to a non-empty value
to get Xapian to default to creating flint databases.

For more information see:

http://wiki.xapian.org/FlintBackend

I'm currently writing a new Btree manager which will produce
substantially more compact Btrees.  The differences with the current one
should particularly benefit the positionlist table which has a lot of
keys, many of them fairly long.
> I keep thinking that I must be missing something in either the way I index
> or the way I (or rather the QueryParser) constructs the queries.
Probably not.

Cheers,
    Olly

Arjen van der Meijden

2005-Nov-05 21:40 UTC

head link

[Xapian-discuss] Tuning Phrase Searching

On 3-11-2005 13:32, tech@dbx.co.uk wrote:> Is there anything that can be done to speed up phrase searching? It is
> currently a show stopper for our CV search system with queries for common
> terms taking several minutes to execute. Simply ANDing the terms together
> will return in 1-3 seconds.
If you know beforehand what your phrase will be like and how you'll 
search them you may be able to. I.e. if you have system paths and look 
through them in "tree-order", you can just build up the subpaths and 
index them as as normal terms (/usr/local/bin/omega can be /usr, 
/usr/local, /usr/local/bin).
But if its just plain text and you want normal sentences to be 
retrievable... you're probaby just stuck to finding each document 
containing the terms and checking whether those terms are in the correct 
order. There are searchengines which only use word-pairs and can 
therefore not correctly identify hits (they also see "foo bar",
"bar
test" as a match for "foo bar test").
It may be faster to combine such word-pairs with normal phrase 
searching, build a query that checks for the correct word-pairs and the 
phrase.
The drawback is of course that you'll increase the size of your postlist 
quite a bit (you don't need it in the position table however). But the 
advantage should be that you can decrease the list of documents a lot 
better than with the normal "and search" which is the basis for the 
phrase search.
> I keep thinking that I must be missing something in either the way I index
> or the way I (or rather the QueryParser) constructs the queries.
In the general case, I don't think there really is a better way. But if 
space is no problem and the speed of the position table is the most 
important part, you may be able to increase the size of the indexes to 
decrease the number of documents to look through.
Olly already mentioned using Flint, using xapian-compact to further 
decrease the size of the database may help a lot for searches. You may 
want to keep two versions of your database, the non-compacted for 
updating and the fully compacted for searches.
For Flint the compaction is a bit less dramatic than for Quartz, with 
Flint our 14G non-compacted database decreases to 12G compacted (which 
uses zlib-compression as well). The drawback of compaction is of course 
the time it takes, it takes one hour to compact on our machine.

Best regards,

Arjen

Xapian discuss - Nov 2005 - Tuning Phrase Searching

[Xapian-discuss] Tuning Phrase Searching

[Xapian-discuss] Tuning Phrase Searching

[Xapian-discuss] Tuning Phrase Searching