Just curious: How does Xapian clean up postings/words from deleted documents? Does it just remove them whenever a posting node is COWed in the Btree? Or is there some kind of periodic reaper function? Thanks! Matt
The termlists in Xapian contain a list of all the terms in a given document, so when a document is deleted these are used to update the postlists for all the relevant terms to remove the document id. The posting list lengths, etc, are all updated immediately (well, at the next commit). This contrasts with Lucene-style systems where the documents are just marked as deleted and garbage collected later. -- Richard On 1 March 2014 19:31, Matt Chaput <matt at whoosh.ca> wrote:> Just curious: How does Xapian clean up postings/words from deleted > documents? Does it just remove them whenever a posting node is COWed in the > Btree? Or is there some kind of periodic reaper function? > > Thanks! > > Matt > > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140301/9f17ec93/attachment-0002.html>
On Sat, Mar 01, 2014 at 08:33:49PM +0000, Richard Boulton wrote:> The termlists in Xapian contain a list of all the terms in a given > document, so when a document is deleted these are used to update the > postlists for all the relevant terms to remove the document id. The posting > list lengths, etc, are all updated immediately (well, at the next commit). > This contrasts with Lucene-style systems where the documents are just > marked as deleted and garbage collected later.Perhaps also worth noting that at some point we'll probably implement the ability to choose either approach: http://trac.xapian.org/ticket/368 Cheers, Olly