Josef Novak
2007-Apr-12 04:33 UTC
[Xapian-discuss] Reasonable Time Expectation for Long Queries?
Hi, I am rather new to Xapian, and am using it to index some FAQ material. I am using xapian 0.9.10, with the default database, which according to the documentation on the website, appears to be 'quartz'. As a first test experiment I have indexed about 710,000 questions, and now I am testing retrieval times with queries of varying length. Things seem to work OK so long as the queries stay small - 1-4 terms, but my test set has a large number of queries containing 20-60+ terms, and these take upwards of 7-8 seconds to parse. I have tested my text processing code, and this does not seem to be the root of the problem. It looks like, after reading the db documentation, that perhaps my first move should be to reindex everything in a flint db, as the documentation says that this will be 'appreciably faster'. My current query code, taken from one of the examples, looks like: Xapian::Query query(Xapian::Query::OP_OR, &string_tokens[0], &string_tokens[string_tokens.size()]); Is there anything else I can do to optimize these simple OP_OR queries? Are there any other suggestions for optimization, or pointers to places in the lists where this has been discussed, with fruitful results? many thanks in advance, Joe
Olly Betts
2007-Apr-12 10:10 UTC
[Xapian-discuss] Reasonable Time Expectation for Long Queries?
On Thu, Apr 12, 2007 at 12:33:14PM +0900, Josef Novak wrote:> It looks like, after reading the db documentation, that perhaps my first > move should be to reindex everything in a flint db, as the documentation > says that this will be 'appreciably faster'.It's unlikely to make building Query objects faster though as they don't touch the database.> My current query code, taken > from one of the examples, looks like: > Xapian::Query query(Xapian::Query::OP_OR, &string_tokens[0], > &string_tokens[string_tokens.size()]); > > Is there anything else I can do to optimize these simple OP_OR queries? Are > there any other suggestions for optimization, or pointers to places in the > lists where this has been discussed, with fruitful results?It sounds like the same issue as this, except that was building the query up pairwise, and using the "in one go" constructor was the workaround in that case: http://thread.gmane.org/gmane.comp.search.xapian.general/3974 I thought I'd worked on a fix for that, but I don't seem to have checked anything in. I probably unpatched it to work on something else - I'll dig it out. Do you have a self-contained (except for Xapian!) small program which shows this? Failing that, some example lists of terms which build into slow queries? Cheers, Olly