We''re using Ferret (but not acts_as_ferret) on a project I''m working on, and I ran into a problem with the document scores returned from searches. I consider myself a Ferret noob... I know a little about its API, having read the O''Reilly shortcut, but I couldn''t find a solution to this problem there. Please allow me to explain: It started when I noticed that all of the relevance scores for each result were exactly the same. By reading the shortcut, I found out that happened because a range query (with initial and final dates) was always included in the queries passed to Ferret, and Ferret''s RangeQuery always return results with identical scores, because it uses a ConstantScoreQuery internally. So far, so good - I removed this range query from the application code, as an experiment, and passed a simple string that translates into a TermQuery to it. From what I know of Ferret, it should return normal scores, but all of them came back as 0. Is this a known behavior/bug? Or did I do something wrong with the search or the indexing? I know the latter is more likely, and if needed I can try to provide some trimmed-down example code. -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com
Hi Bira, this just sounds like your search is getting no hits. The ConstantScoreQuery was giving everything a minimum score but no other hits increased the score. Now you''ve removed the only thing that was providing a score, so it''s dropped to 0. Make sure your indexing and searching is working correctly. Try the ferret-browser tool to review your index - see if it''s what you expect (i.e: has the terms you''re searching for). If all this is working as expect, try posting a snip of your code where you define the index, and where you do a search and we should be able to help. John. -- http://www.brightbox.co.uk - UK/EU Ruby on Rails Hosting http://johnleach.co.uk On Fri, 2008-02-22 at 16:18 -0300, Bira wrote:> We''re using Ferret (but not acts_as_ferret) on a project I''m working > on, and I ran into a problem with the document scores returned from > searches. > > I consider myself a Ferret noob... I know a little about its API, > having read the O''Reilly shortcut, but I couldn''t find a solution to > this problem there. Please allow me to explain: > > It started when I noticed that all of the relevance scores for each > result were exactly the same. By reading the shortcut, I found out > that happened because a range query (with initial and final dates) was > always included in the queries passed to Ferret, and Ferret''s > RangeQuery always return results with identical scores, because it > uses a ConstantScoreQuery internally. > > So far, so good - I removed this range query from the application > code, as an experiment, and passed a simple string that translates > into a TermQuery to it. From what I know of Ferret, it should return > normal scores, but all of them came back as 0. > > Is this a known behavior/bug? Or did I do something wrong with the > search or the indexing? I know the latter is more likely, and if > needed I can try to provide some trimmed-down example code. >
On Fri, Feb 22, 2008 at 4:29 PM, John Leach <john at johnleach.co.uk> wrote:> Hi Bira,> If all this is working as expect, try posting a snip of your code where > you define the index, and where you do a search and we should be able to > help. > > John. >I''ve managed to reduce it to a simple example, which I''ve packed in a 11KB zip file, most of which is a sample text for indexing (an e-mail message from the publicly available Enron archive). Does the list accept attachments? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com
Hi! On Mon, Feb 25, 2008 at 04:15:00PM -0300, Bira wrote:> On Fri, Feb 22, 2008 at 4:29 PM, John Leach <john at johnleach.co.uk> wrote: > > Hi Bira, > > > If all this is working as expect, try posting a snip of your code where > > you define the index, and where you do a search and we should be able to > > help. > > > > John. > > > > I''ve managed to reduce it to a simple example, which I''ve packed in a > 11KB zip file, most of which is a sample text for indexing (an e-mail > message from the publicly available Enron archive). Does the list > accept attachments?not sure, just try it out :-) or upload it somewhere on the ferret wiki. cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
On Mon, Feb 25, 2008 at 4:56 PM, Jens Kraemer <jk at jkraemer.net> wrote:> not sure, just try it out :-) or upload it somewhere on the ferret wiki.OK :). I''m sending the example attached to this message. There''s two Ruby files (indexer.rb and searcher.rb), along with a text file containing an e-mail from the Enron archives, which is the indexable sample. After extracting it to a directory, running indexer.rb will index that single message. Running searcher.rb will perform a pre-definded search on the index, and print out the result and its score. In my local environment (Ferret 0.11.6 on Linux), a single result is returned, as expected, and it''s properly highlighted and everything. Its score is 0. The search is a simple term query for "earnings". -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com -------------- next part -------------- A non-text attachment was scrubbed... Name: minimal.tar.gz Type: application/x-gzip Size: 11705 bytes Desc: not available Url : http://rubyforge.org/pipermail/ferret-talk/attachments/20080226/32518256/attachment.gz
>From my experience with scores, I found that you *have* to establishboosts for each field, otherwise you''ll always get scores that are too low. Try: - configuring boost for, say 3 fields. E.g.: tags => 20, title => 10, description => 15. - Adding entries to the index. - performing searches that hit each of these fields in separate so you can compare. Then check the score in the output. On Wed, Feb 27, 2008 at 12:24 AM, Bira <u.alberton at gmail.com> wrote:> On Mon, Feb 25, 2008 at 4:56 PM, Jens Kraemer <jk at jkraemer.net> wrote: > > not sure, just try it out :-) or upload it somewhere on the ferret wiki. > > OK :). I''m sending the example attached to this message. > > There''s two Ruby files (indexer.rb and searcher.rb), along with a text > file containing an e-mail from the Enron archives, which is the > indexable sample. > > After extracting it to a directory, running indexer.rb will index that > single message. Running searcher.rb will perform a pre-definded search > on the index, and print out the result and its score. > > In my local environment (Ferret 0.11.6 on Linux), a single result is > returned, as expected, and it''s properly highlighted and everything. > Its score is 0. The search is a simple term query for "earnings". > > -- > Bira > http://compexplicita.wordpress.com > http://compexplicita.tumblr.com > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
On Tue, Feb 26, 2008 at 8:12 PM, Julio Cesar Ody <julioody at gmail.com> wrote:> >From my experience with scores, I found that you *have* to establish > boosts for each field, otherwise you''ll always get scores that are too > low. > > Try: > > - configuring boost for, say 3 fields. E.g.: tags => 20, title => 10, > description => 15. > - Adding entries to the index. > - performing searches that hit each of these fields in separate so you > can compare.I tried again, setting :default_boost to 1000 in the example, and the score still came up as zero. By the way, did the message containing the example arrive? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com
Hi! On Wed, Feb 27, 2008 at 03:04:27PM -0300, Bira wrote: [..]> By the way, did the message containing the example arrive?yes it did. I tried it out and got the same result as you - score of 0.0. Removing the :index => :omit_norms option from the FieldInfos declaration leads to the expected result, a non-zero score. It''s not clear from the API docs if this is the expected behaviour: :omit_norms | Same as :yes except omit the | norms file. The norms file can | be omitted if you don''t boost | any fields and you don''t need | scoring based on field length. Here''s Ferret''s explanation of the score computation: 0.0 = field_weight(message:earnings in 0), product of: 3.162278 = tf(term_freq(message:earnings)=10) 0.3068528 = idf(doc_freq=1) 0.0 = field_norm(field=message, doc=0) Looks like Ferret should rather not consider the zero field_norm when computing the score in this case. Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database