jarrod roberson
2006-Jun-04 21:02 UTC
[Xapian-discuss] adding postings vs matchdecider on a value
I have my basic queries working now thanks to suggestions from the list. Now I am trying to craft a "Depth 1" style query and have come up with 2 possible solutions. In WebDAV ( RFC2518) a "Depth 1" listing of all the resources that are CHILDREN of a COLLECTION. Similar to what a normal ls does in unix. The 2 ideas I have come up with are. 1. Using positional posting terms to be able to refine the query. Since I am already adding positional postings for the complete logical path, I was thinking about adding all the path parts with difference prefixes and doing the same thing I am with the logical path. 2. Using a MatchDecider and another value(). I already implemented this, but I don't think this is the optimal way. Since the basic query is like a Depth Infinity ( which means it brings back EVERYTHING ) I am filtering on a value() that contains the entire parent path ( there is no limit on the path sizes ) so this could be really bad in degenerate cases. I think #1 is probably the best way, since all the terms will be used by lots of documents, it shouldn't cause too much data bloat. And it should cut down on the btree accesses as well. What I can't find in any examples or documentation, is what happens when you have multiple terms with the same posting postiion? I have tried it, xapian lets you do it, but I can find out what, if any side effects there are from doing it. I am trying to think of some way I can just check for the existance of a positional term, to exclude everything but the direct children. That would not require any additional values or terms! Any ideas are appreciated
Olly Betts
2006-Jun-05 01:52 UTC
[Xapian-discuss] adding postings vs matchdecider on a value
On Sun, Jun 04, 2006 at 04:02:52PM -0400, jarrod roberson wrote:> The 2 ideas I have come up with are. > > 1. Using positional posting terms to be able to refine the query. > > Since I am already adding positional postings for the complete logical path, > I was thinking about adding all the path parts with difference prefixes and > doing the same thing I am with the logical path. > > 2. Using a MatchDecider and another value().I don't really follow what you're trying to do, but at least at present filtering on terms is likely to be faster than a MatchDecider unless the MatchDecider is doing something that's tricky to implement using filter terms. If you're using positional information in the filtering it'll probably be slower than purely term based filtering. If you've already implemented it using a MatchDecider, you could try the other approach and see how they compare for speed and database size. I'm planning to change how values are stored, and also implement an idea to reduce the number of documents a MatchDecider will typically need to consider, which should make using a MatchDecider for this sort of thing more competitive. It's common to want to be able to sort and set range limits on the same quantities (e.g. sort by date, return documents in a date range) so at least in these cases, it's wasteful to create filtering terms since you have to store the value anyway.> What I can't find in any examples or documentation, is what happens when you > have multiple terms with the same posting postiion? I have tried it, xapian > lets you do it, but I can find out what, if any side effects there are from > doing it.It should work - for example, you might want to store stemmed and unstemmed forms of a word at the same position. Cheers, Olly