Hi Everybody, Here is the first version of my match for an ExternalPostList, it should apply cleanly to 0.9.5 and 0.9.6. You can use it by first implementing an ExternalPostingSource, then creating a new Query object passing a reference an instance of your implementation to the constructor, see query.h. The ExternalPostingSource implementation is reference counted, so when its no longer needed it can clean itself up. It works well for filtering results but it has a problem that once it filters all of the results are returned with 100% relevancy. I'm not quite sure how to fix that but I think its because I just used the default weight implementations in matcher/externalsourcepostlist.h. Any hints or patches to fix that so it works correctly would be greatly appreciated. Also the use of an ExternalPostList query breaks the network method of transmitting queries, I haven't written code to serialize the post list or even decide if that's worth doing. I realize there has been some stylistic license taken with the formatting of this patch. I didn't pay much attention to following existing indentation and bracket rules and what not. If you're up for including this into the release branch, I'd be glad to take the time and clean it up even further. Thanks for all of your help, Rusty -- Rusty Conover InfoGears Inc. Web: http://www.infogears.com ? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20060602/ad66cf25/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: external-post-list-1.diff Type: application/octet-stream Size: 11870 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20060602/ad66cf25/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20060602/ad66cf25/attachment-0001.html>
On Fri, Jun 02, 2006 at 06:24:12PM -0600, Rusty Conover wrote:> You can use it by first implementing an ExternalPostingSource, then > creating a new Query object passing a reference an instance of your > implementation to the constructor, see query.h. The > ExternalPostingSource implementation is reference counted, so when > its no longer needed it can clean itself up.If you're going to reference count it, it would be better to use the existing reference counting machinery (include/xapian/base.h) unless there's a good reason not to.> It works well for filtering results but it has a problem that once it > filters all of the results are returned with 100% relevancy. I'm not > quite sure how to fix that but I think its because I just used the > default weight implementations in matcher/externalsourcepostlist.h. > Any hints or patches to fix that so it works correctly would be > greatly appreciated.I think the weights look OK, but get_termfreq_*() shouldn't return 0 but rather bounds/estimates of the number of entries that the ExternalPostingSource would return if you called next() until at_end() returned true. I suspect returning 0 might confuse the matcher.> Also the use of an ExternalPostList query breaks the network method > of transmitting queries, I haven't written code to serialize the post > list or even decide if that's worth doing.I think this is probably a concept which doesn't fit well with the remote backend. You could perhaps use a registering scheme similar to that provided for user-defined weighting schemes, but a weighting scheme is just an algorithm plus some parameters, whereas this class will typically be talking to some other system like an SQL database.> I realize there has been some stylistic license taken with the > formatting of this patch. I didn't pay much attention to following > existing indentation and bracket rules and what not. If you're up > for including this into the release branch, I'd be glad to take the > time and clean it up even further.I think it's probably suitable for inclusion with a bit more work. Similar ideas have been discussed before, and I can see scope for lots of neat things this enables. I've had a quick read through - here are some comments: * I don't think you need to call start_construction(), etc and have the set_external_source method - just have a Query::Internal::Internal(ExternalPostingSource *) ctor for the internal class. * I think OP_EXTERNAL_POST_LIST should be internal like OP_LEAF. * There's no point trying to encode OP_EXTERNAL_POST_LIST in the query serialisation if this isn't supported by the remote backend (especially as I've just rewritten query serialisation but not checked it in yet!) Better to just throw UnimplementedError if we get there. * ExternalPostingSource should derive from RefCntBase and then you can use the same reference counting mechanism the other API classes do. * Don't renumber OP_ELITE_SET - the java bindings currently have these values hard-wired (ick) so they'll break! That's why it's explicitly numbered currently (the enum with value 9 is no longer used). * In this case, skip_to and next should always return NULL. You only need to return something if you want to replace the current postlist with a different one (e.g. an OR turning into an AND because of optimisations based on the minimum weight required) and we don't want to do that here. Phew! That might seem like a lot, but they're all pretty minor really. Generally this is looking pretty plausible. Cheers, Olly -------------- next part -------------- A non-text attachment was scrubbed... Name: external-post-list-1.diff Type: text/patch Size: 11870 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20060603/09b703f4/attachment.bin>
Pardon my ignorance, but can you explain a little of what one can do with your addition here? I mean what kind of functionality does it add to Xapian? I have read your posts about it, but don't feel like I've wrapped my head around it entirely :-) Alec Rusty Conover wrote:> Hi Everybody, > > Here is the first version of my match for an ExternalPostList, it > should apply cleanly to 0.9.5 and 0.9.6. > > You can use it by first implementing an ExternalPostingSource, then > creating a new Query object passing a reference an instance of > your implementation to the constructor, see query.h. The > ExternalPostingSource implementation is reference counted, so when its > no longer needed it can clean itself up. > > It works well for filtering results but it has a problem that once it > filters all of the results are returned with 100% relevancy. I'm not > quite sure how to fix that but I think its because I just used the > default weight implementations in matcher/externalsourcepostlist.h. > Any hints or patches to fix that so it works correctly would be > greatly appreciated. > > Also the use of an ExternalPostList query breaks the network method of > transmitting queries, I haven't written code to serialize the post > list or even decide if that's worth doing. > > I realize there has been some stylistic license taken with the > formatting of this patch. I didn't pay much attention to following > existing indentation and bracket rules and what not. If you're up for > including this into the release branch, I'd be glad to take the time > and clean it up even further. > > Thanks for all of your help, > > Rusty > -- > Rusty Conover > InfoGears Inc. > Web: http://www.infogears.com > > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20060602/6965dfb2/attachment-0001.html>