Hi, Xapian 1.2 supports a query like: (A OR B) NEAR (C OR D) and distributes the factors to create something like: (A NEAR 2 C) OR (B NEAR 2 C) OR (B NEAR 2 C) OR (A NEAR 2 C) Xapian 1.4 rejects such a query with the error message. OP_NEAR and OP_PHRASE only currently support leaf subqueries Because Recoll expands the terms to their stem siblings at query time, its NEAR queries are affected by the change (no stemming is used with PHRASE queries, so these are unaffected). Of course, it would be possible to effect the distribution at the application level, but, before I get into this, I would like to know if there is a plan to restore the 1.2 behaviour, or if the new one is permanent ? I saw https://trac.xapian.org/ticket/508, but it is rather inconclusive as to the future plans. Cheers, jf
On Thu, Dec 29, 2016 at 07:21:41PM +0100, Jean-Francois Dockes wrote:> Xapian 1.2 supports a query like: > > (A OR B) NEAR (C OR D) > > and distributes the factors to create something like: > > (A NEAR 2 C) OR (B NEAR 2 C) OR (B NEAR 2 C) OR (A NEAR 2 C) > > Xapian 1.4 rejects such a query with the error message. > > OP_NEAR and OP_PHRASE only currently support leaf subqueries > > Because Recoll expands the terms to their stem siblings at query time, its > NEAR queries are affected by the change (no stemming is used with PHRASE > queries, so these are unaffected). > > Of course, it would be possible to effect the distribution at the > application level, but, before I get into this, I would like to know if > there is a plan to restore the 1.2 behaviour, or if the new one is > permanent ? > > I saw https://trac.xapian.org/ticket/508, but it is rather inconclusive as > to the future plans.The plan is that this should be supported (see the title of the ticket, and also note the "currently" in the exception message). The query internals were completely rewritten between 1.2 and 1.4, which is why the old support is gone. The old approach is excessively inefficient so personally I'm not keen to spend time recreating that - I'd rather we implement this "properly", and also make sure that it works in a non-surprising way (which blindly distributing operators doesn't always achieve, as noted in the ticket comments). The ticket has a patch which attempts to handle the OR case (which seems to be the part you actually care about) but this suffers from issues with object lifetimes which get a bit involved in the details. Since there wasn't a working patch when we got to making the hard decisions about which tickets to bump to get 1.4.0 out, and since addressing this shouldn't require ABI changes, it got bumped. Cheers, Olly
Olly Betts writes: > On Thu, Dec 29, 2016 at 07:21:41PM +0100, Jean-Francois Dockes wrote: > > Xapian 1.2 supports a query like: > > > > (A OR B) NEAR (C OR D) > > > > and distributes the factors to create something like: > > > > (A NEAR 2 C) OR (B NEAR 2 C) OR (B NEAR 2 C) OR (A NEAR 2 C) > > > > Xapian 1.4 rejects such a query with the error message. > > > > OP_NEAR and OP_PHRASE only currently support leaf subqueries > > > > Because Recoll expands the terms to their stem siblings at query time, its > > NEAR queries are affected by the change (no stemming is used with PHRASE > > queries, so these are unaffected). > > > > Of course, it would be possible to effect the distribution at the > > application level, but, before I get into this, I would like to know if > > there is a plan to restore the 1.2 behaviour, or if the new one is > > permanent ? > > > > I saw https://trac.xapian.org/ticket/508, but it is rather inconclusive as > > to the future plans. > > The plan is that this should be supported (see the title of the ticket, > and also note the "currently" in the exception message). > > The query internals were completely rewritten between 1.2 and 1.4, which > is why the old support is gone. > > The old approach is excessively inefficient so personally I'm not keen to > spend time recreating that - I'd rather we implement this "properly", and > also make sure that it works in a non-surprising way (which blindly > distributing operators doesn't always achieve, as noted in the ticket > comments). > > The ticket has a patch which attempts to handle the OR case (which seems > to be the part you actually care about) but this suffers from issues with > object lifetimes which get a bit involved in the details. Since there > wasn't a working patch when we got to making the hard decisions about > which tickets to bump to get 1.4.0 out, and since addressing this > shouldn't require ABI changes, it got bumped. Thank you for this answer. I need to choose between three approaches: - Implement support at the application level. - Shift back to 1.2 - Just wait for 1.4.x I'd rather go back to 1.2 than used a patched 1.4 by the way. This all depends on your expected schedule (I guess that this would have been a better term than 'plan', which is indeed described in the ticket). I am not asking for anything beyond information here. Do you have any idea of the very approximate time when the change might be implemented ? Cheers, jf