Doctor Munchkin
2011-Jul-14 09:58 UTC
[Xapian-discuss] 'phrase' default-op mixed with hyphenated words
Hi all, I've come across an issue caused when I try to set the query parser's default op to OP_PHRASE: Xapian raises an Unimplemented Error if the query contains hyphenated words or other terms that implicitly generate a phrase. This can be shown with the following Python extract:>>> from xapian import * >>> qp = QueryParser() >>> qp.set_default_op(Query.OP_PHRASE) >>> print qp.parse_query('John Smith-Jones')Traceback (most recent call last): File "<stdin>", line 1, in <module> xapian.UnimplementedError: Can't use NEAR/PHRASE with a subexpression containing NEAR or PHRASE I'm using the latest release (1.2.6). Are there any plans to implement this functionality, or does anyone have a patch for the query parser that would fix this particular issue? If not, I guess a good workaround on my side would be to convert the hyphen in the query to a space if OP_PHRASE is used, which is essentially what I imagine the query parser should be doing. Thanks, Munchkin.
Olly Betts
2011-Jul-18 06:22 UTC
[Xapian-discuss] 'phrase' default-op mixed with hyphenated words
On Thu, Jul 14, 2011 at 10:58:38AM +0100, Doctor Munchkin wrote:> >>> from xapian import * > >>> qp = QueryParser() > >>> qp.set_default_op(Query.OP_PHRASE) > >>> print qp.parse_query('John Smith-Jones') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > xapian.UnimplementedError: Can't use NEAR/PHRASE with a subexpression > containing NEAR or PHRASE > > I'm using the latest release (1.2.6). > > Are there any plans to implement this functionalityI think we should support arbitrary subqueries for NEAR and PHRASE, but at least personally I don't have particular plans to work on it in the near future.> or does anyone > have a patch for the query parser that would fix this particular > issue? If not, I guess a good workaround on my side would be to > convert the hyphen in the query to a space if OP_PHRASE is used, which > is essentially what I imagine the query parser should be doing.That wouldn't mean the same thing though. If you remove the hyphen, the above would parse as requiring "john", "smith", and "jones" in that order within a 12 word window. With the hyphen, "smith" and "jones" must be in order and *adjacent*. I'm not aware of any such patch anyway. Cheers, Olly