程苏珺
2017-Dec-05 03:01 UTC
How to enhance the query performance for large boolean attribute
Hi all, I am a new user to Xapian, and now we met such problem. In our case, a document has many attributes which is boolean value, for example( A, B, C ) , and our search query will use certain filter logic ( A == true and B == false ..) to combine with other search logic. We use MatchDecider to implement the filter logic, and now we met some performance problem, because our self-defined scoring method is very complicated and cost many time. We do some analyzer, and actually the boolean attribute filter ( A == true and B == false ..) can filter lots of docs, but we found seems the MatchDecider is running after scoring, so it help less to the performance enhancement. So would you please give us some suggesting for our case? Thanks Aimee
Olly Betts
2017-Dec-07 04:35 UTC
How to enhance the query performance for large boolean attribute
On Tue, Dec 05, 2017 at 11:01:27AM +0800, 程苏珺 wrote:> I am a new user to Xapian, and now we met such problem. In our case, a > document has many attributes which is boolean value, for example( A, > B, C ) , and our search query will use certain filter logic ( A => true and B == false ..) to combine with other search logic. > > We use MatchDecider to implement the filter logic, and now we met some > performance problem, because our self-defined scoring method is very > complicated and cost many time. We do some analyzer, and actually the > boolean attribute filter ( A == true and B == false ..) can filter > lots of docs, but we found seems the MatchDecider is running after > scoring, so it help less to the performance enhancement. > > So would you please give us some suggesting for our case?I would add a boolean term to documents where a particular attribute is true, (e.g. XA1 is attribute A is true) and then you can express your boolean filter logic as a Query object - e.g. A == true and B == false is: Xapian::Query(Xapian::Query::OP_AND_NOT, Xapian::Query("XA1") Xapian::Query("XB1")) If you're using Xapian 1.4 and writing in C++, there are operator overloads which allow you to write that as: Xapian::Query("XA1") &~ Xapian::Query("XB1") (There's no need to explicitly index a term when a boolean attribute is false, as you can just filter out those where it is true). You can use this approach to filter to a parsed user query if you want - just combine them using OP_FILTER. Cheers, Olly