Eric Schulte
2007-Jul-10 14:05 UTC
[Ferret-talk] Article score calculations for Boolean and MultiTerm Queries, and customization options
Hi, I have some questions about the way that documents are scored by the Boolean and MultiTerm Queries, and about possible options for custom scoring articles. I am working on a project experimenting with different methods of automatically generating queries and the scoring mechanisms behind Lucene and Ferret have been perplexing us.>From looking at the Lucene explanation at (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html#formula_coord) and through using the explain function in Ferret it seems that the score calculation for a boolean query is (in latex) score = ( querynorm \times fieldnorm ) \sum_{term \in query}{ idf_{term}^{2} tf_{term} boost_{term}} and the calculation for the score of a document matching a MultiTerm Query is score = ( querynorm \times fieldnorm ) idf_{terms \in query}^{2} \sum_{term \in query}{tf_{term} boost_{term}} I would like to implement something much simpler like score = \sum_{term \in query}{tf_{term} boost_{term}} however I''m not incredibly familiar with C, and frankly looking at the scoring calculation in C inside ferret terrified me. Would the pure ruby version of ferret be a good place to try to make these changes? The latest version of that code that I can find is 0.9.4 or so. What would you recommend? Also, do you know why Lucene (and Ferret) use idf squared instead of just idf, that seems like a weird choice to me. Another sticking point is that the method of calculating idf for the MultiTerm queries (the idf of the sum of the df for every term in the query) didn''t seem to make sense. For example with a query with many common words it is possible that the sum of your df''s could be greater than the number of documents in the index. Many Thanks! Eric ps. let me know if the latex equations is too obtuse, and I will try to find another way to express sums in email -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070710/889e436a/attachment.html