Kevin Duraj
2007-Apr-12 19:44 UTC
[Xapian-discuss] How to ignore many occurrence of the same term in one document for relevance computation?
Hello there, I would like to ask how can I make Xapian to ignore relevance computation for documents that has many time occurrence of the same term. Or differently to say I would like to have Xapian ignore relevance computation based on how many times terms is in document. Search example term "Kevin" 1. Document_A contains: Kevin Kevin Kevin Kevin (Relevance 100%) 2. Document_B contains: Kevin (Relevance 100%) I want both documents to have same relevancy. Is that possible? -Kevin
James Aylett
2007-Apr-12 21:15 UTC
[Xapian-discuss] How to ignore many occurrence of the same term in one document for relevance computation?
On Thu, Apr 12, 2007 at 11:44:35AM -0700, Kevin Duraj wrote:> I would like to ask how can I make Xapian to ignore relevance computation > for documents that has many time occurrence of the same term. Or differently > to say I would like to have Xapian ignore relevance computation based on how > many times terms is in document.You can do this by fiddling with the Weight mechanism. The key here is to drop the wdf (within document frequency) of each term. I think you can just set k1_ to 0 in BM25Weight, but I've never tried it. Use Xapian::Enquire::set_weighting_scheme() to replace the default weighting scheme (you can construct a BM25Weight object with the relevant parameters to pass in to this). J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org