The following test script was written to test what I found as a possible
bug in query parser
#!/usr/bin/perl
use strict;
use Search::Xapian qw/:standard/;
my $QueryParser = new Search::Xapian::QueryParser();
$QueryParser->set_default_op(OP_AND);
$QueryParser->set_stemmer(new Search::Xapian::Stem("english"));
$QueryParser->set_stemming_strategy(STEM_SOME);
$QueryParser->add_boolean_prefix("Title","T");
print "this script is to test the LoveHate feature in conjunction
with a single boolean prefixes.\nNotice that when using boolean
prefixes, the -notallowed translates to a regular AND search rather
than a AND_NOT as it should be.\nAlso note, brackets, or order of
the terms does not make a difference.\n\nHowever,
it seems that if at least one of the terms is not a boolean prefix,
the parser parses the query correctly, regardless of order. Not 100%
verified this bit, but seems so.\n\n";
print "right: ".$QueryParser->parse_query(qq{word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{(Title:word)
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{Title:word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{-notallowed
Title:word},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "right: ".$QueryParser->parse_query(qq{term Title:word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "right: ".$QueryParser->parse_query(qq{Title:first term
Title:word -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE
| FLAG_WILDCARD))."\n";
This is the output:
this script is to test the LoveHate feature in conjunction with a
single boolean prefixes.
Notice that when using boolean prefixes, the -notallowed translates
to a regular AND search rather than a AND_NOT as it should be.
Also note, brackets, or order of the terms does not make a difference.
However, it seems that if at least one of the terms is not a boolean
prefix, the parser parses the query correctly, regardless of order.
Not 100% verified this bit, but seems so.
right: Xapian::Query((Zword:(pos=1) AND_NOT Znotallow:(pos=2)))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
FILTER Tword))
right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
FILTER (Tfirst OR Tword)))
Notice that the third search has [Znotallow:(pos=1)] rather than
[AND_NOT Znotallow:(pos=1)] or placing it in the FILTER section
It seems that when placing at least one non prefixed term, the parser
manages to parse the phrase, regardless of where that word is.
Your thoughts?
And one last question regarding the parser in this case..
Should/Could there be any performance difference between the following
three parsed queries? (FILTER vs AND_NOT and AND_NOT*2 vs AND_NOT/OR)
1. Xapian::Query(((Zterm:(pos=1) Znotallow:(pos=2)) FILTER (Tfirst OR
Tword)))
2. Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2) AND_NOT
Tfirst:(pos=3)) FILTER Tword))
3. Xapian::Query(((Zterm:(pos=1) AND_NOT (Znotallow:(pos=2) OR
Tfirst:(pos=3))) FILTER Tword))
Ron
On Tue, Oct 23, 2007 at 04:35:04PM +0200, Ron Kass wrote:> print "wrong: ".$QueryParser->parse_query(qq{Title:word > -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE | > FLAG_WILDCARD))."\n";I think this is related to a problem I noticed earlier this week - we fail to parse filter-type operations in the middle of a query: foo site:example.org bar foo -site:example.org bar foo -ignore bar People tend to specify filters at the end, which I guess is why nobody noticed this before. I looked into those cases and it's down to the grammar rules not allowing it, which is a bug, but a bit more involved to fix than your previous one. I'll add your testcases to mine and check they all work when I fix this.> And one last question regarding the parser in this case.. > Should/Could there be any performance difference between the following > three parsed queries? (FILTER vs AND_NOT and AND_NOT*2 vs AND_NOT/OR) > 1. Xapian::Query(((Zterm:(pos=1) Znotallow:(pos=2)) FILTER (Tfirst OR > Tword)))There seems to be an operator (AND_NOT?) missing before Znotallow.> 2. Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2) AND_NOT > Tfirst:(pos=3)) FILTER Tword)) > 3. Xapian::Query(((Zterm:(pos=1) AND_NOT (Znotallow:(pos=2) OR > Tfirst:(pos=3))) FILTER Tword))I can see that (2) and (3) are essentially the same query represented in two different ways. But (1) seems to be a different query (no matter what the missing operator is). If that's correct, then (1) clearly can (and often will) perform differently to (2) and (3). Currently, (2) and (3) will actually be executed in different ways. I'm not certain which would be more efficient (and it may depend on the data). I suspect there's not much in it unless there are a lot of filter terms, in which case my hunch is that (3) might have the edge because of the balancing we do for OrPostList trees. If you have, or can easily produce, some benchmark data, it would be interesting to know. I've implemented an internal "QueryOptimiser" class for 1.0.4 which provides a much improved framework for building optimal postlist trees from queries, so it's now much easier to do these sort of things. Cheers, Olly
How do you think it relates to the test I produced, if it all? Different bug? Best regards, Ron Olly Betts wrote:> On Tue, Oct 23, 2007 at 07:17:05PM +0200, Ron Kass wrote: > >> Or did you mean something else when talking about filters at the end? >> > > I mean for my test cases. > > Cheers, > Olly >
On Tue, Oct 23, 2007 at 07:42:20PM +0200, Ron Kass wrote:> How do you think it relates to the test I produced, if it all?I think the root cause is the same for both. Cheers, Olly
> print "wrong: ".$QueryParser->parse_query(qq{(Title:word) > -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE | > FLAG_WILDCARD))."\n"; > print "wrong: ".$QueryParser->parse_query(qq{Title:word > -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE | > FLAG_WILDCARD))."\n"; > print "wrong: ".$QueryParser->parse_query(qq{-notallowed > Title:word},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE | > FLAG_WILDCARD))."\n";These should all now work in SVN HEAD. The patch overlaps other changes so won't apply cleanly to 1.0.3, but you can get bootstrapped snapshots of SVN HEAD from here: http://www.xapian.org/bleeding.php Cheers, Olly