Markus Wörle
2007-Feb-07 17:00 UTC
[Xapian-discuss] timing behaviour for creating large queries
Hello I am using Xapian with perl-bindings, and a self-written queryparser, which is mandatory for my application. My queryparser builds a query object by connecting two objects by an operator using the Search::Xapian::Query constructor, e.g. Search::Xapian::Query->new(OP_OR, $left, $right) and so on. The queries it'll get, are mostly probabilistic and could become very large - about 200 up to 2400 OR'ed terms. To estimate the amount of time for a growing amount of terms inside a query, i did some benchmarks. A result can be found here: http://5nord.org/~mrks/query-performance.gif (this is tested on a _very_ slow machine, so do not be concerned about the concrete amount of time mentioned there) The yellow area is the amount of time which is needed by xapian to process the query. The red area is the time needed by my queryparser to parse the query and build up the query object. You see, its runtime behaves quadratic. I did some research on this (since my parsing algorithm has a linear complexity), and found out that the Search::Xapian::Query constructor, which joins the subtrees behaves quadratic. I think this is, because each joined three will be validated by libxapian, independent of if its subtrees got already validated or not. To get more internal: Every call of the constrator gets the query validated by Xapian::Query::Internal::prevalidate_query (), and prevalidate_query() calls validate_query() which calls prevalidate_query() recursively on each subtree. Do you have any suggestions for my problem? As a final solution, and if the recursive prevalidate_query() is really the reason to my weak performance (it badly looks so), I would have to patch my copy and remove those checks. Regards, mrks
Richard Boulton
2007-Feb-07 17:28 UTC
[Xapian-discuss] timing behaviour for creating large queries
Markus W?rle wrote:> My queryparser builds a query object by connecting two objects by an > operator using the Search::Xapian::Query constructor, e.g. > > Search::Xapian::Query->new(OP_OR, $left, $right)The C++ interface supports a constructor which takes a sequence of terms to build an OR query out of - this should get around the quadratic complexity you're seeing (though I haven't tested to check). I'm not belive that construction method is available from Perl, having taken a quick peek at the XS code. A very interesting performance issue, this. -- Richard
Kevin Duraj
2007-Feb-08 19:23 UTC
[Xapian-discuss] timing behaviour for creating large queries
Markus, I am doing similar research with Xapian Perl bindings on Linux using Boolean OR and AND queries and my numbers are below. I will do same measurement with same data on same machine with Lucene .NET on Windows. I would like to know if Xapian performs better then Lucene and if yes, how much better? - 100 Boolean (AND / OR) queries of 500 terms against 20 million docs avg: 139 ms per query. real 0m13.915s user 0m9.513s sys 0m4.397s - 100 Boolean (AND / OR) queries of 1,000 terms against 20 million docs avg: 275 ms per query. real 0m27.503s user 0m20.302s sys 0m7.193s -Kevin Duraj On 2/7/07, Markus W?rle <mrks@mrks.de> wrote:> > Hello > > I am using Xapian with perl-bindings, and a self-written queryparser, > which is mandatory for my application. > > My queryparser builds a query object by connecting two objects by an > operator using the Search::Xapian::Query constructor, e.g. > > Search::Xapian::Query->new(OP_OR, $left, $right) > > and so on. The queries it'll get, are mostly probabilistic and could > become very large - about 200 up to 2400 OR'ed terms. To estimate > the amount of time for a growing amount of terms inside a query, i > did some benchmarks. A result can be found here: > > http://5nord.org/~mrks/query-performance.gif > (this is tested on a _very_ slow machine, so do not be concerned > about the concrete amount of time mentioned there) > > The yellow area is the amount of time which is needed by xapian to > process the query. The red area is the time needed by my queryparser > to parse the query and build up the query object. You see, its > runtime behaves quadratic. I did some research on this (since my > parsing algorithm has a linear complexity), and found out that the > Search::Xapian::Query constructor, which joins the subtrees behaves > quadratic. I think this is, because each joined three will be > validated by libxapian, independent of if its subtrees got already > validated or not. To get more internal: Every call of the constrator > gets the query validated by Xapian::Query::Internal::prevalidate_query > (), and prevalidate_query() calls validate_query() which calls > prevalidate_query() recursively on each subtree. > > Do you have any suggestions for my problem? > > As a final solution, and if the recursive prevalidate_query() is > really the reason to my weak performance (it badly looks so), I would > have to patch my copy and remove those checks. > > Regards, > mrks > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >