Olly Betts writes: > On Thu, Dec 29, 2016 at 07:21:41PM +0100, Jean-Francois Dockes wrote: > > Xapian 1.2 supports a query like: > > > > (A OR B) NEAR (C OR D) > > > > and distributes the factors to create something like: > > > > (A NEAR 2 C) OR (B NEAR 2 C) OR (B NEAR 2 C) OR (A NEAR 2 C) > > > > Xapian 1.4 rejects such a query with the error message. > > > > OP_NEAR and OP_PHRASE only currently support leaf subqueries > > > > Because Recoll expands the terms to their stem siblings at query time, its > > NEAR queries are affected by the change (no stemming is used with PHRASE > > queries, so these are unaffected). > > > > Of course, it would be possible to effect the distribution at the > > application level, but, before I get into this, I would like to know if > > there is a plan to restore the 1.2 behaviour, or if the new one is > > permanent ? > > > > I saw https://trac.xapian.org/ticket/508, but it is rather inconclusive as > > to the future plans. > > The plan is that this should be supported (see the title of the ticket, > and also note the "currently" in the exception message). > > The query internals were completely rewritten between 1.2 and 1.4, which > is why the old support is gone. > > The old approach is excessively inefficient so personally I'm not keen to > spend time recreating that - I'd rather we implement this "properly", and > also make sure that it works in a non-surprising way (which blindly > distributing operators doesn't always achieve, as noted in the ticket > comments). > > The ticket has a patch which attempts to handle the OR case (which seems > to be the part you actually care about) but this suffers from issues with > object lifetimes which get a bit involved in the details. Since there > wasn't a working patch when we got to making the hard decisions about > which tickets to bump to get 1.4.0 out, and since addressing this > shouldn't require ABI changes, it got bumped. Thank you for this answer. I need to choose between three approaches: - Implement support at the application level. - Shift back to 1.2 - Just wait for 1.4.x I'd rather go back to 1.2 than used a patched 1.4 by the way. This all depends on your expected schedule (I guess that this would have been a better term than 'plan', which is indeed described in the ticket). I am not asking for anything beyond information here. Do you have any idea of the very approximate time when the change might be implemented ? Cheers, jf
On Wed, Jan 04, 2017 at 07:29:58AM +0100, Jean-Francois Dockes wrote:> Olly Betts writes: > > The ticket has a patch which attempts to handle the OR case (which seems > > to be the part you actually care about) but this suffers from issues with > > object lifetimes which get a bit involved in the details. Since there > > wasn't a working patch when we got to making the hard decisions about > > which tickets to bump to get 1.4.0 out, and since addressing this > > shouldn't require ABI changes, it got bumped. > > Thank you for this answer. > > I need to choose between three approaches: > > - Implement support at the application level. > - Shift back to 1.2 > - Just wait for 1.4.xOr help fix up the patch in the ticket?> I'd rather go back to 1.2 than used a patched 1.4 by the way.Once we have a working patch, it should be mergable into 1.4.x (I can't see why any ABI changes would be needed) so using a patched 1.4 shouldn't be an issuie.> This all depends on your expected schedule (I guess that this would have > been a better term than 'plan', which is indeed described in the ticket). I > am not asking for anything beyond information here. Do you have any idea of > the very approximate time when the change might be implemented ?I had another poke at the patch and have a reworked version which solves the object lifetime issue and works for some simple tests. Can you try it out and see if it works for you? https://trac.xapian.org/ticket/508#comment:13 There are two limitations: * Only OP_OR subqueries are handled. I think supporting these would be a useful step forward by itself, and AIUI it's all you actually need. * Currently the OP_OR subqueries can only have two subqueries of their own. Lifting this restriction needs a bit of work on the new OrPositionList class - the old patch used a series of pairwise OrPositionList objects, but the new patch needs a single one instead - the class needs reworking to handle that. So I think the second limitation needs addressing, and of course any bugs resolving. I can't promise anything re schedule, but hopefully we can sort this out fairly soon. At least the solution for what's missing now is fairly clear - we probably want to put the sub-positionlists into a min heap. Cheers, Olly
Olly Betts writes: > On Wed, Jan 04, 2017 at 07:29:58AM +0100, Jean-Francois Dockes wrote: > > Olly Betts writes: > > > The ticket has a patch which attempts to handle the OR case (which seems > > > to be the part you actually care about) but this suffers from issues with > > > object lifetimes which get a bit involved in the details. Since there > > > wasn't a working patch when we got to making the hard decisions about > > > which tickets to bump to get 1.4.0 out, and since addressing this > > > shouldn't require ABI changes, it got bumped. > > > > Thank you for this answer. > > > > I need to choose between three approaches: > > > > - Implement support at the application level. > > - Shift back to 1.2 > > - Just wait for 1.4.x > > Or help fix up the patch in the ticket? Yep. But I earnestly believe that I'm not up to the task of fiddling with Xapian internals. You may remember that I gave it a try quite a long time ago, (it was the very same issue actually), and that, if I remember well, my change did not quite do what it was supposed to do... > > I'd rather go back to 1.2 than used a patched 1.4 by the way. > > Once we have a working patch, it should be mergable into 1.4.x (I can't > see why any ABI changes would be needed) so using a patched 1.4 > shouldn't be an issue. My phrase was unclear: explanation: I could use a patched 1.4 on Windows where libxapian is bundled with recoll, but I was thinking ahead to a situation where I'd have a 1.2/1.4 choice on Linux, where bundling a patched 1.4 would not be acceptable. In the latter case, I'd rather use 1.2 because of the NEAR issue. > > This all depends on your expected schedule (I guess that this would have > > been a better term than 'plan', which is indeed described in the ticket). I > > am not asking for anything beyond information here. Do you have any idea of > > the very approximate time when the change might be implemented ? > > I had another poke at the patch and have a reworked version which solves the > object lifetime issue and works for some simple tests. Can you try it out > and see if it works for you? > > https://trac.xapian.org/ticket/508#comment:13 > > There are two limitations: > > * Only OP_OR subqueries are handled. I think supporting these would be a > useful step forward by itself, and AIUI it's all you actually need. Yes, my need arises from stem or synonym expansions occurring inside a NEAR query. This happens without the user doing anything special, so it's a problem when it causes an error. Recoll also supports multi-word synonyms which could potentially generate PHRASE subqueries inside NEAR queries, but this understandably already did not work with 1.2, so the multi-word expansions are only used when proximity is not involved (by the way, proximity of phrases does make sense in this case, if there is a wishlist somewhere, but it's admittedly not an issue that most users will be concerned with...). > * Currently the OP_OR subqueries can only have two subqueries of their own. > Lifting this restriction needs a bit of work on the new > OrPositionList class > - the old patch used a series of pairwise OrPositionList objects, but the > new patch needs a single one instead - the class needs reworking to handle > that. > > So I think the second limitation needs addressing, and of course any bugs > resolving. I am not sure that I completely understand this paragraph, but, anyway, although I have a bit of trouble reading my own code, I think that recoll will only add flat OP_OR queries as subqueries of the NEAR one. I tested the patch and it does seem to answer my selfish needs... > I can't promise anything re schedule, but hopefully we can sort this out > fairly soon. At least the solution for what's missing now is fairly clear - > we probably want to put the sub-positionlists into a min heap. See, you lost me with the last phrase, and that's why it's better that I don't get into Xapian-core internals :) Anyway it's good enough to know that a patch exists which will hopefully make its way into 1.4.x, meaning that I have no need to work on a bad application-level solution. Thanks ! Cheers, jf