Hurricane Tong
2014-Mar-17 06:53 UTC
[Xapian-devel] questions about move_to_chunk_containing
Hi, Thanks for commenting on my proposal. While trying to update my proposal, I get confused in move_to_chunk_containing. In void BrassPostList::move_to_chunk_containing(Xapian::docid desired_did), (void)cursor->find_entry(BrassPostListTable::make_key(term, desired_did)) Does this function make cursor point to the chunk where the first id in the chunk is less than desired_did and the first id in next chunk is bigger than desired_did ? If did1 and did2 is in the same chunk, make_key returns different key. But how can find_entry turn to same chunk with different key ? ------------------ Shangtong Zhang,Second Year Undergraduate, School of Computer Science, Fudan University, China. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140317/f470a8e5/attachment-0002.html>
On Mon, Mar 17, 2014 at 02:53:47PM +0800, Hurricane Tong wrote:> (void)cursor->find_entry(BrassPostListTable::make_key(term, desired_did)) > Does this function make cursor point to the chunk where the first id > in the chunk is less than desired_did and the first id in next chunk > is bigger than desired_did ?It is documented in the header here: http://trac.xapian.org/browser/trunk/xapian-core/backends/brass/brass_cursor.h#L283 So if there's a chunk with exactly the desired key (which is a chunk for term starting with desired_did, the cursor will point to that. Otherwise the cursor points to the last chunk with a key < the desired key. You might notice that this isn't actually quite ideal - if we have a chunk with docids 10-1000 and one with docids 2000-3000, then looking for 1500 will land us on the first chunk, whereas the next chunk we're interested in is actually the 2000-3000 one. I can't see how to avoid this without an incompatible change to the format, and in practice, the loss of efficiency from this is probably not dramatic (the majority of the time the chunk before will be in the same block as the chunk we actually want, and seeking to a gap doesn't happen every time). But it's something I've had in mind to look at one day. I think you'd probably have to make the key use the *last* docid in the chunk instead of the first, which is a bit awkward as that changes when we append to a chunk.> If did1 and did2 is in the same chunk, make_key returns different key. > But how can find_entry turn to same chunk with different key ?Since did1 and did2 are in the same chunk, there can't be a chunk which starts with any docid between did1 and did2, so the cursor must end up on the same chunk when you search for the keys built for the same term plus did1 or did2). Cheers, Olly