thr3ads.net - Ferret talk - [Ferret-talk] Very small scores for search results [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Peter Royle

2006-Oct-15 23:53 UTC

[Ferret-talk] Very small scores for search results

Hi Everyone,

I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same results
with
0.10.10 and 0.10.9 though).

I''m running into an odd problem where the scores of my top-ranking 
search results are ridiculously small - even when the query is one that 
should match at least one document with a decent score.

To give an example, I have just the names of 5 businesses indexed using 
the standard analyzer. (The same happens with thousands of records 
indexed by many fields but I''ve simplified for this example). One of 
those businesses is called "ABC Master Building Designers". When I do
a
query for "building" I get "ABC Master Building Designers"
as the top
result, but with the following explanation (via code a added to 
acts_as_ferret for debugging):

QUERY: id:building name:building

EXPLANATION of building: 8.438619e-42 = product of:
  1.687724e-41 = weight(name:building in 3), product of:
    0.6125279 = query_weight(name:building), product of:
      2.386294 = idf(doc_freq=1)
      0.2566858 = query_norm
    2.755373e-41 = field_weight(name:building in 3), product of:
      1.0 = tf(term_freq(name:building)=1)
      2.386294 = idf(doc_freq=1)
      1.15467e-41 = field_norm(field=name, doc=3)
  0.5 = coord(1/2)

Note the tiny score of field_norm which is throwing the whole score out. 
The net result is that all the records aren''t differenciated by much
and
so the ordering of the results rarely makes much sense. I sometimes get 
restaurants in the search results!

I haven''t used any boost or anything on the name field. My Business 
class calls AaF like this:

class Business < ActiveRecord::Base

  acts_as_ferret(
      :fields => { :name => {  } },
      :or_default => true
      )

  ...

end

Does anyone have any ideas as to what might be causeing this? Any help 
would be greatly appreciated.

Thanks,

Pete.




-- 
Posted via http://www.ruby-forum.com/.

Johnny Cussen

2006-Oct-16 06:21 UTC

head link

[Ferret-talk] Very small scores for search results

Pete,

I noticed the same thing over the weekend. Haven''t started  
investigating yet though.

Johnny

On 16/10/2006, at 9:53 AM, Peter Royle wrote:
> Hi Everyone,
>
> I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same
results
> with
> 0.10.10 and 0.10.9 though).
>
> I''m running into an odd problem where the scores of my top-ranking
> search results are ridiculously small - even when the query is one  
> that
> should match at least one document with a decent score.
>
> To give an example, I have just the names of 5 businesses indexed  
> using
> the standard analyzer. (The same happens with thousands of records
> indexed by many fields but I''ve simplified for this example). One
of
> those businesses is called "ABC Master Building Designers". When
I
> do a
> query for "building" I get "ABC Master Building
Designers" as the top
> result, but with the following explanation (via code a added to
> acts_as_ferret for debugging):
>
> QUERY: id:building name:building
>
> EXPLANATION of building: 8.438619e-42 = product of:
>   1.687724e-41 = weight(name:building in 3), product of:
>     0.6125279 = query_weight(name:building), product of:
>       2.386294 = idf(doc_freq=1)
>       0.2566858 = query_norm
>     2.755373e-41 = field_weight(name:building in 3), product of:
>       1.0 = tf(term_freq(name:building)=1)
>       2.386294 = idf(doc_freq=1)
>       1.15467e-41 = field_norm(field=name, doc=3)
>   0.5 = coord(1/2)
>
> Note the tiny score of field_norm which is throwing the whole score  
> out.
> The net result is that all the records aren''t differenciated by  
> much and
> so the ordering of the results rarely makes much sense. I sometimes  
> get
> restaurants in the search results!
>
> I haven''t used any boost or anything on the name field. My
Business
> class calls AaF like this:
>
> class Business < ActiveRecord::Base
>
>   acts_as_ferret(
>       :fields => { :name => {  } },
>       :or_default => true
>       )
>
>   ...
>
> end
>
> Does anyone have any ideas as to what might be causeing this? Any help
> would be greatly appreciated.
>
> Thanks,
>
> Pete.
>
>
>
>
> -- 
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

David Balmain

2006-Oct-16 06:54 UTC

head link

[Ferret-talk] Very small scores for search results

On 10/16/06, Peter Royle <howardmoon at hitcity.com.au>
wrote:> Hi Everyone,
>
> I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same
results with
> 0.10.10 and 0.10.9 though).
>
> I''m running into an odd problem where the scores of my top-ranking
> search results are ridiculously small - even when the query is one that
> should match at least one document with a decent score.
>
> To give an example, I have just the names of 5 businesses indexed using
> the standard analyzer. (The same happens with thousands of records
> indexed by many fields but I''ve simplified for this example). One
of
> those businesses is called "ABC Master Building Designers". When
I do a
> query for "building" I get "ABC Master Building
Designers" as the top
> result, but with the following explanation (via code a added to
> acts_as_ferret for debugging):
>
> QUERY: id:building name:building
>
> EXPLANATION of building: 8.438619e-42 = product of:
>   1.687724e-41 = weight(name:building in 3), product of:
>     0.6125279 = query_weight(name:building), product of:
>       2.386294 = idf(doc_freq=1)
>       0.2566858 = query_norm
>     2.755373e-41 = field_weight(name:building in 3), product of:
>       1.0 = tf(term_freq(name:building)=1)
>       2.386294 = idf(doc_freq=1)
>       1.15467e-41 = field_norm(field=name, doc=3)
>   0.5 = coord(1/2)
>
> Note the tiny score of field_norm which is throwing the whole score out.
> The net result is that all the records aren''t differenciated by
much and
> so the ordering of the results rarely makes much sense. I sometimes get
> restaurants in the search results!
>
> I haven''t used any boost or anything on the name field. My
Business
> class calls AaF like this:
>
> class Business < ActiveRecord::Base
>
>   acts_as_ferret(
>       :fields => { :name => {  } },
>       :or_default => true
>       )
>
>   ...
>
> end
>
> Does anyone have any ideas as to what might be causeing this? Any help
> would be greatly appreciated.
Hi Pete,

Are you on a Mac by any chance? There are problems with the scoring on
OS X but I''m not sure why.

Cheers,
Dave

Peter Royle

2006-Oct-16 09:49 UTC

head link

[Ferret-talk] Very small scores for search results

> Hi Pete,
> 
> Are you on a Mac by any chance? There are problems with the scoring on
> OS X but I''m not sure why.
> 
> Cheers,
> Dave
Hi Dave.

Yes, I am! I''ve deployed on my Linux box and reindexed and everything 
seems to be going fine. Thanks for the tip.

Johnny, does this solve it for you too?

Pete.

-- 
Posted via http://www.ruby-forum.com/.

Johnny Cussen

2006-Oct-16 10:24 UTC

head link

[Ferret-talk] Very small scores for search results

Yep. Weird huh.


On 16/10/2006, at 7:49 PM, Peter Royle wrote:
>> Hi Pete,
>>
>> Are you on a Mac by any chance? There are problems with the  
>> scoring on
>> OS X but I''m not sure why.
>>
>> Cheers,
>> Dave
>
> Hi Dave.
>
> Yes, I am! I''ve deployed on my Linux box and reindexed and
everything
> seems to be going fine. Thanks for the tip.
>
> Johnny, does this solve it for you too?
>
> Pete.
>
> -- 
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

David Balmain

2006-Oct-16 11:00 UTC

head link

[Ferret-talk] Very small scores for search results

On 10/16/06, Johnny Cussen <cussen at gmail.com>
wrote:> Yep. Weird huh.
Not as weird as you might think. OS X (and other *BSD based systems)
have a different endianess to Windows and Linux. Unfortuntately I
don''t have a Mac to test on. I''m waiting for someone to donate
a Mac
or enough money for me to buy one. ;-) Alternatively, I''m sure I could
fix the problem if someone could offer me an ssh login to an OS X
server. Or better yet, someone could send me a patch. If any Mac users
are reading this and they''d like to have a go at fixing this
themselves, the problem has something to do with the way floats are
compressed into bytes in c/src/helper.c. The C unit tests probably
won''t pass so if you can fix them the problem should be fixed. Let me
know if anyone wants to have a go at fixing this.

Cheers,
Dave

Marvin Humphrey

2006-Oct-16 12:24 UTC

head link

[Ferret-talk] Very small scores for search results

On Oct 16, 2006, at 4:00 AM, David Balmain wrote:
> If any Mac users
> are reading this and they''d like to have a go at fixing this
> themselves, the problem has something to do with the way floats are
> compressed into bytes in c/src/helper.c. The C unit tests probably
> won''t pass so if you can fix them the problem should be fixed. Let
me
> know if anyone wants to have a go at fixing this.
/me raises hand.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

David Balmain

2006-Oct-16 12:51 UTC

head link

[Ferret-talk] Very small scores for search results

On 10/16/06, Marvin Humphrey <marvin at rectangular.com>
wrote:>
> On Oct 16, 2006, at 4:00 AM, David Balmain wrote:
>
> > If any Mac users
> > are reading this and they''d like to have a go at fixing this
> > themselves, the problem has something to do with the way floats are
> > compressed into bytes in c/src/helper.c. The C unit tests probably
> > won''t pass so if you can fix them the problem should be
fixed. Let me
> > know if anyone wants to have a go at fixing this.
>
> /me raises hand.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
You have a Mac Marvin? I didn''t realize. I''m guessing you
already know
how to run the C unit tests. Let me know if there is anything else I
can do to help.

Dave

Marvin Humphrey

2006-Oct-16 13:16 UTC

head link

[Ferret-talk] Very small scores for search results

On Oct 16, 2006, at 4:00 AM, David Balmain wrote:
>  The C unit tests probably
> won''t pass so if you can fix them the problem should be fixed.
Test output for subversion repository revision 653 on my G4 PowerBook  
below...

I don''t see anything failing specifically relating to how Similarity  
encodes/decodes norms.  Is there a test for that?  Have a look at...

<http://www.rectangular.com/svn/kinosearch/trunk/t/504-similarity.t>

The important test is the one that just takes 0 .. 255, transforms  
those to 256 floats, transforms them back again and checks that we  
get 0 .. 255.  You have something like that?

/me investigates ...

Ah, don''t see something like that in test_similarity.c  Probably I  
can add that and send you a patch.  Think that''s the right direction,  
based on the test results?

PS: I have a Mac Mini that just sits there as a backup in case the  
PowerBook has to go into the shop.  If you write a script that emails  
you results in case of test failures, I can set up a cron to do  
nightly smokes.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

slothbear:~/projects/ferret010/ruby marvin$ ruby setup.rb test
Running tests...
Loading once
Loaded suite test
Started
........................................................................ 
...........................................FF.............F............. 
...............
Finished in 8.037183 seconds.

   1) Failure:
test_sorts(SearchAndSortTest)
     [./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_search_and_sort.rb:40:in  
`do_test_top_docs''
      ./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_search_and_sort.rb:39:in `times''
      ./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_search_and_sort.rb:39:in  
`do_test_top_docs''
      ./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_search_and_sort.rb:113:in  
`test_sorts'']:
<8> expected but was
<1>.

   2) Failure:
test_boolean_query(SearcherTest)
     [./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits''
      ./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tm_searcher.rb:98:in  
`test_boolean_query'']:
<14> expected but was
<2>.

   3) Failure:
test_boolean_query(SimpleMultiSearcherTest)
     [./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits''
      ./test/unit/../unit/analysis/../../unit/index/../../unit/ 
query_parser/../../unit/search/tm_searcher.rb:98:in  
`test_boolean_query'']:
<14> expected but was
<2>.

159 tests, 11469 assertions, 3 failures, 0 errors
slothbear:~/projects/ferret010/ruby marvin$

David Balmain

2006-Oct-16 14:16 UTC

head link

[Ferret-talk] Very small scores for search results

On 10/16/06, Marvin Humphrey <marvin at rectangular.com>
wrote:>
> On Oct 16, 2006, at 4:00 AM, David Balmain wrote:
>
> >  The C unit tests probably
> > won''t pass so if you can fix them the problem should be
fixed.
>
> Test output for subversion repository revision 653 on my G4 PowerBook
> below...
Hmmm. They failures are related to float/byte encoding as they are
occuring because the scoring is wrong. But the float/byte conversion
test doesn''t seem to be failing.
> I don''t see anything failing specifically relating to how
Similarity
> encodes/decodes norms.  Is there a test for that?  Have a look at...
>
> <http://www.rectangular.com/svn/kinosearch/trunk/t/504-similarity.t>
>
> The important test is the one that just takes 0 .. 255, transforms
> those to 256 floats, transforms them back again and checks that we
> get 0 .. 255.  You have something like that?
>
> /me investigates ...
>
> Ah, don''t see something like that in test_similarity.c  Probably I
> can add that and send you a patch.  Think that''s the right
direction,
> based on the test results?

Yeah, it is in test/test_helper.c. I guess I should put a comment in
test_similarity about that since that is where most people would
expect to find such a test. Anyway, since it is passing the error must
be occuring somewhere else. I can''t think why though as it definitely
seems to have something to do with the norms.
> PS: I have a Mac Mini that just sits there as a backup in case the
> PowerBook has to go into the shop.  If you write a script that emails
> you results in case of test failures, I can set up a cron to do
> nightly smokes.
><snip>test results</snip>

That''d be great thanks. I''ll probably take a while to get
around to
it. Anyway, don''t spend too much time on this. I think it is better
for both of us if you concentrate on Lucy. I''ve seen a lot of action
recently on the commits list. :D

Cheers,
Dave

Marvin Humphrey

2006-Oct-16 17:40 UTC

head link

[Ferret-talk] Very small scores for search results

On Oct 16, 2006, at 7:16 AM, David Balmain wrote:
> Yeah, it is in test/test_helper.c.
helper.c was the culprit, all right...
>> I can set up a cron to do nightly smokes.
> I''ll probably take a while to get around to it.
Weirdo.

:D

I''d LUV to have regular smoke tests done for me on systems I
don''t
have access to!  The big one for me is Windows.  Fortunately, there''s  
a bunch of people on PerlMonks who''ll run tests for me on their  
Windows boxes when I ask.

I''ll probably whip up a Perl script that smokes Ferret.  If I
don''t
generalize it (assume availability of svn, etc), that''s cake -- 50  
lines, including the email message.  The only reason I didn''t  
volunteer at first is that I figured you could write one in Ruby and  
then you might get some other smokers besides me.
> Anyway, don''t spend too much time on this.
I didn''t.  But it wasn''t hard to find something which made the
failing tests go away.  Patch below.

The patch might not be 100% optimal -- I didn''t bother looking at how  
POSH implements those functions.  I''ll leave that to you.

Meanwhile, I''ll go implement the same functionality for Charmonizer.   
Funny how I''ve been working on this very issue!
> I think it is better
> for both of us if you concentrate on Lucy. I''ve seen a lot of
action
> recently on the commits list. :D
Yeah, it''s nice when a concept works out and stuff just flows...  :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Index: c/src/helper.c
==================================================================---
c/src/helper.c      (revision 653)
+++ c/src/helper.c      (working copy)
@@ -14,13 +14,21 @@
{
      union { f_i32 i; float f; } tmp;
      tmp.f = f;
+#ifdef POSH_LITTLE_ENDIAN
      return POSH_LittleU32(tmp.i);
+#else
+    return POSH_BigU32(tmp.i);
+#endif
}
float int2float(f_i32 i32)
{
      union { f_i32 i; float f; } tmp;
+#ifdef POSH_LITTLE_ENDIAN
      tmp.i = POSH_LittleU32(i32);
+#else
+    tmp.i = POSH_BigU32(i32);
+#endif
      return tmp.f;
}

David Balmain

2006-Oct-17 01:33 UTC

head link

[Ferret-talk] Very small scores for search results

On 10/17/06, Marvin Humphrey <marvin at rectangular.com>
wrote:>
> On Oct 16, 2006, at 7:16 AM, David Balmain wrote:
>
> > Yeah, it is in test/test_helper.c.
>
> helper.c was the culprit, all right...
>
> >> I can set up a cron to do nightly smokes.
>
> > I''ll probably take a while to get around to it.
>
> Weirdo.
>
> :D
>
> I''d LUV to have regular smoke tests done for me on systems I
don''t
> have access to!  The big one for me is Windows.  Fortunately,
there''s
> a bunch of people on PerlMonks who''ll run tests for me on their
> Windows boxes when I ask.
You''re right. I''m a fool to pass up such an offer so lightly.
I guess
I just really want a Mac user within the Ferret community to take
ownership of this.
> I''ll probably whip up a Perl script that smokes Ferret.  If I
don''t
> generalize it (assume availability of svn, etc), that''s cake -- 50
> lines, including the email message.  The only reason I didn''t
> volunteer at first is that I figured you could write one in Ruby and
> then you might get some other smokers besides me.
You''re right. I''ll do this.
> > Anyway, don''t spend too much time on this.
>
> I didn''t.  But it wasn''t hard to find something which
made the
> failing tests go away.  Patch below.
>
> The patch might not be 100% optimal -- I didn''t bother looking at
how
> POSH implements those functions.  I''ll leave that to you.
Funnily enough the patch reduces the operation to a no-op. I guess I
don''t need to worry about endianess here since floats have the same
endianess as integers. I should have thought about that a little more
and I could have saved you the trouble of having to look at it. :P
> Meanwhile, I''ll go implement the same functionality for
Charmonizer.
> Funny how I''ve been working on this very issue!
Great. I''m going to swap out POSH for charminizer in Ferret ASAP.

Thanks again Marvin. I''ll check smoke_test.rb into the base directory
of the Ferret repo when I''m done.

Cheers,
Dave
> > I think it is better
> > for both of us if you concentrate on Lucy. I''ve seen a lot of
action
> > recently on the commits list. :D
>
> Yeah, it''s nice when a concept works out and stuff just flows... 
:)
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
> Index: c/src/helper.c
> ==================================================================> ---
c/src/helper.c      (revision 653)
> +++ c/src/helper.c      (working copy)
> @@ -14,13 +14,21 @@
> {
>       union { f_i32 i; float f; } tmp;
>       tmp.f = f;
> +#ifdef POSH_LITTLE_ENDIAN
>       return POSH_LittleU32(tmp.i);
> +#else
> +    return POSH_BigU32(tmp.i);
> +#endif
> }
> float int2float(f_i32 i32)
> {
>       union { f_i32 i; float f; } tmp;
> +#ifdef POSH_LITTLE_ENDIAN
>       tmp.i = POSH_LittleU32(i32);
> +#else
> +    tmp.i = POSH_BigU32(i32);
> +#endif
>       return tmp.f;
> }
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Marvin Humphrey

2006-Oct-17 02:45 UTC

head link

[Ferret-talk] Very small scores for search results

On Oct 16, 2006, at 6:33 PM, David Balmain wrote:
> Funnily enough the patch reduces the operation to a no-op.
Ah.  Makes sense.
> I guess I
> don''t need to worry about endianess here since floats have the
same
> endianess as integers.
I believe that the representation is IEEE 754 both on little-endian  
chips like the Intels, and big-endian chips like the PowerPC.  The  
sign bit is indeed the "leftmost" bit in that representation,  
regardless of chip architecture.

Where the float-int union technique (which is also used by KinoSearch  
and CLucene) will fall down is on architectures that don''t use IEEE  
754, like VAX.  Then the encode/decode will get all screwed up.

http://www.codeproject.com/tools/libnumber.asp

Fortunately, the 0 .. 255 test will fail, so we''ll know about the  
problem when it occurs.  Non-IEEE floats are rare, these days,  
anyhow.  POSH doesn''t even support ''em.
> Great. I''m going to swap out POSH for charminizer in Ferret ASAP.
That will be very helpful.  We''ll see how soon ASAP is.  :)
> I''ll check smoke_test.rb into the base directory
> of the Ferret repo when I''m done.
Grooves.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Apparently Analagous Threads

Search for more reasonably related threads

Ferret talk - Oct 2006 - Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

[Ferret-talk] Very small scores for search results

Apparently Analagous Threads