Hi Everyone, I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same results with 0.10.10 and 0.10.9 though). I''m running into an odd problem where the scores of my top-ranking search results are ridiculously small - even when the query is one that should match at least one document with a decent score. To give an example, I have just the names of 5 businesses indexed using the standard analyzer. (The same happens with thousands of records indexed by many fields but I''ve simplified for this example). One of those businesses is called "ABC Master Building Designers". When I do a query for "building" I get "ABC Master Building Designers" as the top result, but with the following explanation (via code a added to acts_as_ferret for debugging): QUERY: id:building name:building EXPLANATION of building: 8.438619e-42 = product of: 1.687724e-41 = weight(name:building in 3), product of: 0.6125279 = query_weight(name:building), product of: 2.386294 = idf(doc_freq=1) 0.2566858 = query_norm 2.755373e-41 = field_weight(name:building in 3), product of: 1.0 = tf(term_freq(name:building)=1) 2.386294 = idf(doc_freq=1) 1.15467e-41 = field_norm(field=name, doc=3) 0.5 = coord(1/2) Note the tiny score of field_norm which is throwing the whole score out. The net result is that all the records aren''t differenciated by much and so the ordering of the results rarely makes much sense. I sometimes get restaurants in the search results! I haven''t used any boost or anything on the name field. My Business class calls AaF like this: class Business < ActiveRecord::Base acts_as_ferret( :fields => { :name => { } }, :or_default => true ) ... end Does anyone have any ideas as to what might be causeing this? Any help would be greatly appreciated. Thanks, Pete. -- Posted via http://www.ruby-forum.com/.
Pete, I noticed the same thing over the weekend. Haven''t started investigating yet though. Johnny On 16/10/2006, at 9:53 AM, Peter Royle wrote:> Hi Everyone, > > I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same results > with > 0.10.10 and 0.10.9 though). > > I''m running into an odd problem where the scores of my top-ranking > search results are ridiculously small - even when the query is one > that > should match at least one document with a decent score. > > To give an example, I have just the names of 5 businesses indexed > using > the standard analyzer. (The same happens with thousands of records > indexed by many fields but I''ve simplified for this example). One of > those businesses is called "ABC Master Building Designers". When I > do a > query for "building" I get "ABC Master Building Designers" as the top > result, but with the following explanation (via code a added to > acts_as_ferret for debugging): > > QUERY: id:building name:building > > EXPLANATION of building: 8.438619e-42 = product of: > 1.687724e-41 = weight(name:building in 3), product of: > 0.6125279 = query_weight(name:building), product of: > 2.386294 = idf(doc_freq=1) > 0.2566858 = query_norm > 2.755373e-41 = field_weight(name:building in 3), product of: > 1.0 = tf(term_freq(name:building)=1) > 2.386294 = idf(doc_freq=1) > 1.15467e-41 = field_norm(field=name, doc=3) > 0.5 = coord(1/2) > > Note the tiny score of field_norm which is throwing the whole score > out. > The net result is that all the records aren''t differenciated by > much and > so the ordering of the results rarely makes much sense. I sometimes > get > restaurants in the search results! > > I haven''t used any boost or anything on the name field. My Business > class calls AaF like this: > > class Business < ActiveRecord::Base > > acts_as_ferret( > :fields => { :name => { } }, > :or_default => true > ) > > ... > > end > > Does anyone have any ideas as to what might be causeing this? Any help > would be greatly appreciated. > > Thanks, > > Pete. > > > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
On 10/16/06, Peter Royle <howardmoon at hitcity.com.au> wrote:> Hi Everyone, > > I''m using Ferret 0.10.11 with acts_as_ferret from SVN (same results with > 0.10.10 and 0.10.9 though). > > I''m running into an odd problem where the scores of my top-ranking > search results are ridiculously small - even when the query is one that > should match at least one document with a decent score. > > To give an example, I have just the names of 5 businesses indexed using > the standard analyzer. (The same happens with thousands of records > indexed by many fields but I''ve simplified for this example). One of > those businesses is called "ABC Master Building Designers". When I do a > query for "building" I get "ABC Master Building Designers" as the top > result, but with the following explanation (via code a added to > acts_as_ferret for debugging): > > QUERY: id:building name:building > > EXPLANATION of building: 8.438619e-42 = product of: > 1.687724e-41 = weight(name:building in 3), product of: > 0.6125279 = query_weight(name:building), product of: > 2.386294 = idf(doc_freq=1) > 0.2566858 = query_norm > 2.755373e-41 = field_weight(name:building in 3), product of: > 1.0 = tf(term_freq(name:building)=1) > 2.386294 = idf(doc_freq=1) > 1.15467e-41 = field_norm(field=name, doc=3) > 0.5 = coord(1/2) > > Note the tiny score of field_norm which is throwing the whole score out. > The net result is that all the records aren''t differenciated by much and > so the ordering of the results rarely makes much sense. I sometimes get > restaurants in the search results! > > I haven''t used any boost or anything on the name field. My Business > class calls AaF like this: > > class Business < ActiveRecord::Base > > acts_as_ferret( > :fields => { :name => { } }, > :or_default => true > ) > > ... > > end > > Does anyone have any ideas as to what might be causeing this? Any help > would be greatly appreciated.Hi Pete, Are you on a Mac by any chance? There are problems with the scoring on OS X but I''m not sure why. Cheers, Dave
> Hi Pete, > > Are you on a Mac by any chance? There are problems with the scoring on > OS X but I''m not sure why. > > Cheers, > DaveHi Dave. Yes, I am! I''ve deployed on my Linux box and reindexed and everything seems to be going fine. Thanks for the tip. Johnny, does this solve it for you too? Pete. -- Posted via http://www.ruby-forum.com/.
Yep. Weird huh. On 16/10/2006, at 7:49 PM, Peter Royle wrote:>> Hi Pete, >> >> Are you on a Mac by any chance? There are problems with the >> scoring on >> OS X but I''m not sure why. >> >> Cheers, >> Dave > > Hi Dave. > > Yes, I am! I''ve deployed on my Linux box and reindexed and everything > seems to be going fine. Thanks for the tip. > > Johnny, does this solve it for you too? > > Pete. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
On 10/16/06, Johnny Cussen <cussen at gmail.com> wrote:> Yep. Weird huh.Not as weird as you might think. OS X (and other *BSD based systems) have a different endianess to Windows and Linux. Unfortuntately I don''t have a Mac to test on. I''m waiting for someone to donate a Mac or enough money for me to buy one. ;-) Alternatively, I''m sure I could fix the problem if someone could offer me an ssh login to an OS X server. Or better yet, someone could send me a patch. If any Mac users are reading this and they''d like to have a go at fixing this themselves, the problem has something to do with the way floats are compressed into bytes in c/src/helper.c. The C unit tests probably won''t pass so if you can fix them the problem should be fixed. Let me know if anyone wants to have a go at fixing this. Cheers, Dave
On Oct 16, 2006, at 4:00 AM, David Balmain wrote:> If any Mac users > are reading this and they''d like to have a go at fixing this > themselves, the problem has something to do with the way floats are > compressed into bytes in c/src/helper.c. The C unit tests probably > won''t pass so if you can fix them the problem should be fixed. Let me > know if anyone wants to have a go at fixing this./me raises hand. Marvin Humphrey Rectangular Research http://www.rectangular.com/
On 10/16/06, Marvin Humphrey <marvin at rectangular.com> wrote:> > On Oct 16, 2006, at 4:00 AM, David Balmain wrote: > > > If any Mac users > > are reading this and they''d like to have a go at fixing this > > themselves, the problem has something to do with the way floats are > > compressed into bytes in c/src/helper.c. The C unit tests probably > > won''t pass so if you can fix them the problem should be fixed. Let me > > know if anyone wants to have a go at fixing this. > > /me raises hand. > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/You have a Mac Marvin? I didn''t realize. I''m guessing you already know how to run the C unit tests. Let me know if there is anything else I can do to help. Dave
On Oct 16, 2006, at 4:00 AM, David Balmain wrote:> The C unit tests probably > won''t pass so if you can fix them the problem should be fixed.Test output for subversion repository revision 653 on my G4 PowerBook below... I don''t see anything failing specifically relating to how Similarity encodes/decodes norms. Is there a test for that? Have a look at... <http://www.rectangular.com/svn/kinosearch/trunk/t/504-similarity.t> The important test is the one that just takes 0 .. 255, transforms those to 256 floats, transforms them back again and checks that we get 0 .. 255. You have something like that? /me investigates ... Ah, don''t see something like that in test_similarity.c Probably I can add that and send you a patch. Think that''s the right direction, based on the test results? PS: I have a Mac Mini that just sits there as a backup in case the PowerBook has to go into the shop. If you write a script that emails you results in case of test failures, I can set up a cron to do nightly smokes. Marvin Humphrey Rectangular Research http://www.rectangular.com/ slothbear:~/projects/ferret010/ruby marvin$ ruby setup.rb test Running tests... Loading once Loaded suite test Started ........................................................................ ...........................................FF.............F............. ............... Finished in 8.037183 seconds. 1) Failure: test_sorts(SearchAndSortTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_search_and_sort.rb:40:in `do_test_top_docs'' ./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_search_and_sort.rb:39:in `times'' ./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_search_and_sort.rb:39:in `do_test_top_docs'' ./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_search_and_sort.rb:113:in `test_sorts'']: <8> expected but was <1>. 2) Failure: test_boolean_query(SearcherTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits'' ./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tm_searcher.rb:98:in `test_boolean_query'']: <14> expected but was <2>. 3) Failure: test_boolean_query(SimpleMultiSearcherTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits'' ./test/unit/../unit/analysis/../../unit/index/../../unit/ query_parser/../../unit/search/tm_searcher.rb:98:in `test_boolean_query'']: <14> expected but was <2>. 159 tests, 11469 assertions, 3 failures, 0 errors slothbear:~/projects/ferret010/ruby marvin$
On 10/16/06, Marvin Humphrey <marvin at rectangular.com> wrote:> > On Oct 16, 2006, at 4:00 AM, David Balmain wrote: > > > The C unit tests probably > > won''t pass so if you can fix them the problem should be fixed. > > Test output for subversion repository revision 653 on my G4 PowerBook > below...Hmmm. They failures are related to float/byte encoding as they are occuring because the scoring is wrong. But the float/byte conversion test doesn''t seem to be failing.> I don''t see anything failing specifically relating to how Similarity > encodes/decodes norms. Is there a test for that? Have a look at... > > <http://www.rectangular.com/svn/kinosearch/trunk/t/504-similarity.t> > > The important test is the one that just takes 0 .. 255, transforms > those to 256 floats, transforms them back again and checks that we > get 0 .. 255. You have something like that? > > /me investigates ... > > Ah, don''t see something like that in test_similarity.c Probably I > can add that and send you a patch. Think that''s the right direction, > based on the test results?Yeah, it is in test/test_helper.c. I guess I should put a comment in test_similarity about that since that is where most people would expect to find such a test. Anyway, since it is passing the error must be occuring somewhere else. I can''t think why though as it definitely seems to have something to do with the norms.> PS: I have a Mac Mini that just sits there as a backup in case the > PowerBook has to go into the shop. If you write a script that emails > you results in case of test failures, I can set up a cron to do > nightly smokes. ><snip>test results</snip> That''d be great thanks. I''ll probably take a while to get around to it. Anyway, don''t spend too much time on this. I think it is better for both of us if you concentrate on Lucy. I''ve seen a lot of action recently on the commits list. :D Cheers, Dave
On Oct 16, 2006, at 7:16 AM, David Balmain wrote:> Yeah, it is in test/test_helper.c.helper.c was the culprit, all right...>> I can set up a cron to do nightly smokes.> I''ll probably take a while to get around to it.Weirdo. :D I''d LUV to have regular smoke tests done for me on systems I don''t have access to! The big one for me is Windows. Fortunately, there''s a bunch of people on PerlMonks who''ll run tests for me on their Windows boxes when I ask. I''ll probably whip up a Perl script that smokes Ferret. If I don''t generalize it (assume availability of svn, etc), that''s cake -- 50 lines, including the email message. The only reason I didn''t volunteer at first is that I figured you could write one in Ruby and then you might get some other smokers besides me.> Anyway, don''t spend too much time on this.I didn''t. But it wasn''t hard to find something which made the failing tests go away. Patch below. The patch might not be 100% optimal -- I didn''t bother looking at how POSH implements those functions. I''ll leave that to you. Meanwhile, I''ll go implement the same functionality for Charmonizer. Funny how I''ve been working on this very issue!> I think it is better > for both of us if you concentrate on Lucy. I''ve seen a lot of action > recently on the commits list. :DYeah, it''s nice when a concept works out and stuff just flows... :) Marvin Humphrey Rectangular Research http://www.rectangular.com/ Index: c/src/helper.c ==================================================================--- c/src/helper.c (revision 653) +++ c/src/helper.c (working copy) @@ -14,13 +14,21 @@ { union { f_i32 i; float f; } tmp; tmp.f = f; +#ifdef POSH_LITTLE_ENDIAN return POSH_LittleU32(tmp.i); +#else + return POSH_BigU32(tmp.i); +#endif } float int2float(f_i32 i32) { union { f_i32 i; float f; } tmp; +#ifdef POSH_LITTLE_ENDIAN tmp.i = POSH_LittleU32(i32); +#else + tmp.i = POSH_BigU32(i32); +#endif return tmp.f; }
On 10/17/06, Marvin Humphrey <marvin at rectangular.com> wrote:> > On Oct 16, 2006, at 7:16 AM, David Balmain wrote: > > > Yeah, it is in test/test_helper.c. > > helper.c was the culprit, all right... > > >> I can set up a cron to do nightly smokes. > > > I''ll probably take a while to get around to it. > > Weirdo. > > :D > > I''d LUV to have regular smoke tests done for me on systems I don''t > have access to! The big one for me is Windows. Fortunately, there''s > a bunch of people on PerlMonks who''ll run tests for me on their > Windows boxes when I ask.You''re right. I''m a fool to pass up such an offer so lightly. I guess I just really want a Mac user within the Ferret community to take ownership of this.> I''ll probably whip up a Perl script that smokes Ferret. If I don''t > generalize it (assume availability of svn, etc), that''s cake -- 50 > lines, including the email message. The only reason I didn''t > volunteer at first is that I figured you could write one in Ruby and > then you might get some other smokers besides me.You''re right. I''ll do this.> > Anyway, don''t spend too much time on this. > > I didn''t. But it wasn''t hard to find something which made the > failing tests go away. Patch below. > > The patch might not be 100% optimal -- I didn''t bother looking at how > POSH implements those functions. I''ll leave that to you.Funnily enough the patch reduces the operation to a no-op. I guess I don''t need to worry about endianess here since floats have the same endianess as integers. I should have thought about that a little more and I could have saved you the trouble of having to look at it. :P> Meanwhile, I''ll go implement the same functionality for Charmonizer. > Funny how I''ve been working on this very issue!Great. I''m going to swap out POSH for charminizer in Ferret ASAP. Thanks again Marvin. I''ll check smoke_test.rb into the base directory of the Ferret repo when I''m done. Cheers, Dave> > I think it is better > > for both of us if you concentrate on Lucy. I''ve seen a lot of action > > recently on the commits list. :D > > Yeah, it''s nice when a concept works out and stuff just flows... :) > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > Index: c/src/helper.c > ==================================================================> --- c/src/helper.c (revision 653) > +++ c/src/helper.c (working copy) > @@ -14,13 +14,21 @@ > { > union { f_i32 i; float f; } tmp; > tmp.f = f; > +#ifdef POSH_LITTLE_ENDIAN > return POSH_LittleU32(tmp.i); > +#else > + return POSH_BigU32(tmp.i); > +#endif > } > float int2float(f_i32 i32) > { > union { f_i32 i; float f; } tmp; > +#ifdef POSH_LITTLE_ENDIAN > tmp.i = POSH_LittleU32(i32); > +#else > + tmp.i = POSH_BigU32(i32); > +#endif > return tmp.f; > } > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
On Oct 16, 2006, at 6:33 PM, David Balmain wrote:> Funnily enough the patch reduces the operation to a no-op.Ah. Makes sense.> I guess I > don''t need to worry about endianess here since floats have the same > endianess as integers.I believe that the representation is IEEE 754 both on little-endian chips like the Intels, and big-endian chips like the PowerPC. The sign bit is indeed the "leftmost" bit in that representation, regardless of chip architecture. Where the float-int union technique (which is also used by KinoSearch and CLucene) will fall down is on architectures that don''t use IEEE 754, like VAX. Then the encode/decode will get all screwed up. http://www.codeproject.com/tools/libnumber.asp Fortunately, the 0 .. 255 test will fail, so we''ll know about the problem when it occurs. Non-IEEE floats are rare, these days, anyhow. POSH doesn''t even support ''em.> Great. I''m going to swap out POSH for charminizer in Ferret ASAP.That will be very helpful. We''ll see how soon ASAP is. :)> I''ll check smoke_test.rb into the base directory > of the Ferret repo when I''m done.Grooves. Marvin Humphrey Rectangular Research http://www.rectangular.com/