(1) From documentation I know Xapian employs BM25 to estimate weights of query terms and documents. But how does it ensure that the final weight for a record scales from 0 to 1? It seems to me that Xapian::BM25Weight::get_sumpart could become larger than 1? Did I misunderstand anything? (2) Also for a quick search with "class OR list", I thought I would get the following three records the same weight "100%". But I was wrong. What can be the factors influencing this? thanks, Sabrina Xapian: Full source documentation: xapian-core: Class List Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members | Related Pages xapian-core Class List Here are the classes, structs, unions and interfaces with brief descriptions: ... /docs/sourcedoc/html/annotated.html 100% relevant, matching: class and list Xapian: API documentation: xapian-core: Xapian Namespace Reference Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members The Xapian library lives in the Xapian namespace. More... Classes class Xapian::Database This class is used to... /docs/apidoc/html/namespaceXapian.html 99% relevant, matching: class and list Xapian: Full source documentation: xapian-core: Xapian Namespace Reference Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members | Related Pages The Xapian library lives in the Xapian namespace. More... Classes class Xapian::ByQueryIndexCmp ... /docs/sourcedoc/html/namespaceXapian.html 99% relevant, matching: class and list
On Tue, May 24, 2005 at 03:34:11AM +0000, Sabrina Shen wrote:> (1) From documentation I know Xapian employs BM25 to estimate weights > of query terms and documents. But how does it ensure that the final > weight for a record scales from 0 to 1? It seems to me that > Xapian::BM25Weight::get_sumpart could become larger than 1? Did I > misunderstand anything?The BM25 weight *can* be larger than 1. However that doesn't mean we can't produce a percentage score between 0 and 100... If the highest ranking document matches all the terms in the query, then we simply divide all weights by this and multiply by 100% to give the percentage score. If the highest ranking document doesn't match all terms, we simply multiply by less than 100%. The score to multiply by is determined by looking at which terms match.> (2) Also for a quick search with "class OR list", I thought I would get the > following three records the same weight "100%". But I was wrong. What > can be the factors influencing this?Those factors are the within document frequencies (wdfs) of the two terms, and the document lengths. And it seems to be working here - the top matching document is the class list for the whole sources. Cheers, Olly