is there any difference between using document.add_posting (in which term is added one by one) and TermGenerator().index_text for creating the xapian database? does TermGenerator also take into account the position of terms added? which method is faster? or are they both the same? thanks a lot!!! indexer = xapian.TermGenerator() indexer.set_document(doc) indexer.index_text(text) VS WORD_RE = re.compile(r"\\w{1,32}", re.U) for index, term in enumerate(WORD_RE.finditer(text)): doc.add_posting(stemmer.stem_word(term.group()), index)
James Aylett
2008-Aug-24 13:26 UTC
[Xapian-discuss] doc.add_posting vs TermGenerator().index_text
On Sat, Aug 16, 2008 at 08:10:54AM -0700, mark wrote:> is there any difference between using document.add_posting (in which > term is added one by one) > and TermGenerator().index_text for creating the xapian database?TermGenerator will generate a complete set of Omega-compatible terms (both unstemmed and stemmed words, using our standard word-splitting algorithm). Adding things one by one may be compatible if your manual term generation works in the same way. TermGenerator should be at least as fast as doing it manually. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james at tartarus.org uncertaintydivision.org