Mike Mangino
2007-Jul-13 03:33 UTC
[Ferret-talk] More sorting problems with untokenized index
I''m having problems sorting on untokenized fields. I have one field that sorts fine, but there are others that seem to sort on a different field. Here''s the index description: acts_as_ferret :remote=>true,:fields=>{:name=>{:boost=>2},:name_for_sort=>{:index => :untokenized}, :city=>{:boost=>2}, :city_for_sort=>{:index=>:untokenized}, :state=>{:boost=>2}, :state_for_sort=>{:index=>:untokenized}, :tag_list=>{:boost=>0},:tag_list_for_sort=>{:boost=>0}, :date_summary=>{:boost=>1}, :date_for_range=>{:boost=>0}, :start_date=>{:boost=>0}} When I sort on name_for_sort it works fine. City_for_sort however causes problems. Here is a random offset. There are 16,000 records, so I wouldn''t expect so much disparity:>> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 100).map(&:city_for_sort)=> ["laguna hills", "burlington", "buffalo", "sun valley", "ottawa", "alexandria", "green bay", "cleveland", "aurora denver lakewood", "corpus christi"] and a later batch:>> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 400).map(&:city_for_sort)=> ["ocean shores", "austin", "boca raton", "sauvie island", "crested butte", "austin", "portland", "avery", "leadville", "houston"] Notice that name works:>> Event.find_by_contents("marathon",:sort=>"name_for_sort",:offset => 400).map(&:name_for_sort)=> ["Columbus Marathon", "Columbus Marathon", "Columbus Marathon", "Columbus Marathon", "Columbus Marathon", "Columbus Marathon Relay", "Columbus Marathon Relay", "Columbus Marathon Relay", "Comcast Baltimore Marathon", "Comcast Baltimore Marathon"] Notice however that it appears to be sorting on a range column, even when we ask for city_for_sort:>> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 400).map(&:date_for_range)=> ["20060709", "20060708", "20060708", "20060704", "20060704", "20060704", "20060704", "20060702", "20060701", "20060701"] Does anyone have an idea what could cause this? I''ve rebuild the index several times and it doesn''t help. I''ve also noticed that the default field list doesn''t include the columns: using index in script/../config/../config/../index/development/event default field list: [:state, :start_date, :name, :tag_list, :city, :tag_list_for_sort, :date_for_range, :date_summary] When I look at ferret-browser, it does show the city_for_sort column. I can browse the values in order and its parameters match those of name_for_sort which works. I''m completely stumped. -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Jul-13 08:06 UTC
[Ferret-talk] More sorting problems with untokenized index
Hi! Just to rule out the possibility of aaf being the culprit here, could you try your queries using Ferret directly on the index? Shut down your app and the DRb server before, just to be sure ;-) Then you could also try to use the sort API instead of string sort (check out the Sort and SortField classes in Ferret''s API). The fact that your untokenized fields do not appear in the default field list is ok, the default field list lists the field used by aaf for searching (when no specific fieldnames are used in your queries) and excludes untokenized fields (searching these fields could lead to less search results than you expected, if you are interested in why this is the case - this has been discussed on this list a few months ago). Your date_for_range and tag_list_for_sort fields should also be untokenized, however that won''t solve your problem I guess. Jens On Fri, Jul 13, 2007 at 05:33:47AM +0200, Mike Mangino wrote:> I''m having problems sorting on untokenized fields. I have one field that > sorts fine, but there are others that seem to sort on a different field. > Here''s the index description: > > > acts_as_ferret > :remote=>true,:fields=>{:name=>{:boost=>2},:name_for_sort=>{:index => > :untokenized}, > :city=>{:boost=>2}, :city_for_sort=>{:index=>:untokenized}, > :state=>{:boost=>2}, :state_for_sort=>{:index=>:untokenized}, > :tag_list=>{:boost=>0},:tag_list_for_sort=>{:boost=>0}, > :date_summary=>{:boost=>1}, > :date_for_range=>{:boost=>0}, > :start_date=>{:boost=>0}} > > When I sort on name_for_sort it works fine. > > City_for_sort however causes problems. Here is a random offset. There > are 16,000 records, so I wouldn''t expect so much disparity: > > >> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 100).map(&:city_for_sort) > => ["laguna hills", "burlington", "buffalo", "sun valley", "ottawa", > "alexandria", "green bay", "cleveland", "aurora denver lakewood", > "corpus christi"] > > > and a later batch: > > >> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 400).map(&:city_for_sort) > => ["ocean shores", "austin", "boca raton", "sauvie island", "crested > butte", "austin", "portland", "avery", "leadville", "houston"] > > > Notice that name works: > > >> Event.find_by_contents("marathon",:sort=>"name_for_sort",:offset => 400).map(&:name_for_sort) > => ["Columbus Marathon", "Columbus Marathon", "Columbus Marathon", > "Columbus Marathon", "Columbus Marathon", "Columbus Marathon Relay", > "Columbus Marathon Relay", "Columbus Marathon Relay", "Comcast Baltimore > Marathon", "Comcast Baltimore Marathon"] > > > Notice however that it appears to be sorting on a range column, even > when we ask for city_for_sort: > > >> Event.find_by_contents("marathon",:sort=>"city_for_sort",:offset => 400).map(&:date_for_range) > => ["20060709", "20060708", "20060708", "20060704", "20060704", > "20060704", "20060704", "20060702", "20060701", "20060701"] > > > > Does anyone have an idea what could cause this? I''ve rebuild the index > several times and it doesn''t help. > > I''ve also noticed that the default field list doesn''t include the > columns: > > using index in script/../config/../config/../index/development/event > default field list: [:state, :start_date, :name, :tag_list, :city, > :tag_list_for_sort, :date_for_range, :date_summary] > > When I look at ferret-browser, it does show the city_for_sort column. I > can browse the values in order and its parameters match those of > name_for_sort which works. > > I''m completely stumped. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Mike Mangino
2007-Jul-13 12:51 UTC
[Ferret-talk] More sorting problems with untokenized index
Jens Kraemer wrote:> Hi! > > Just to rule out the possibility of aaf being the culprit here, could > you try your queries using Ferret directly on the index? Shut down your > app and the DRb server before, just to be sure ;-) >Sure> Then you could also try to use the sort API instead of string sort > (check out the Sort and SortField classes in Ferret''s API).Good suggestion. Sort with a string is still broken when I use ferret directly. Sort with auto type is as well: total_hits = fidx.search_each("marathon",:sort=>Ferret::Search::Sort.new([Ferret::Search::SortField.new(:city_for_sort)]),:offset=>400) do |hit,score| ?> doc = fidx[hit]>> results << doc[:id] >> end=> 1887>> Event.find(results).map(&:city_for_sort)=> ["ocean shores", "austin", "boca raton", "sauvie island", "crested butte", "austin", "portland", "avery", "leadville", "houston"] However, when I specify a string sort, that seems to fix it:>> Event.find(results).map(&:city_for_sort)=> ["bellevue", "baton rouge", "basalt", "bend", "bellevue", "bedford", "bend", "berlin", "berlin", "beijing, china"]>> total_hits = fidx.search_each("marathon",:sort=>Ferret::Search::Sort.new([Ferret::Search::SortField.new(:city_for_sort,:type=>:string)]),:offset=>400) do |hit,score|?> doc = fidx[hit]>> results << doc[:id] >> end=> 1887 There were some fields with the text "0". I wonder if it was guessing the wrong type of index? I cleaned up that data and I''m rebuilding the index now. [snip]> > > Jens >-- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Jul-13 12:55 UTC
[Ferret-talk] More sorting problems with untokenized index
On Fri, Jul 13, 2007 at 02:51:00PM +0200, Mike Mangino wrote:> Jens Kraemer wrote:[..]> However, when I specify a string sort, that seems to fix it: > > >> Event.find(results).map(&:city_for_sort) > => ["bellevue", "baton rouge", "basalt", "bend", "bellevue", "bedford", > "bend", "berlin", "berlin", "beijing, china"] > >> total_hits = fidx.search_each("marathon",:sort=>Ferret::Search::Sort.new([Ferret::Search::SortField.new(:city_for_sort,:type=>:string)]),:offset=>400) do |hit,score| > ?> doc = fidx[hit] > >> results << doc[:id] > >> end > => 1887 > > > There were some fields with the text "0". I wonder if it was guessing > the wrong type of index? I cleaned up that data and I''m rebuilding the > index now.That sounds like a really good explanation to me. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Mike Mangino
2007-Jul-13 15:14 UTC
[Ferret-talk] More sorting problems with untokenized index
Jens Kraemer wrote:> On Fri, Jul 13, 2007 at 02:51:00PM +0200, Mike Mangino wrote: >> Jens Kraemer wrote: > > [..] > >> >> >> There were some fields with the text "0". I wonder if it was guessing >> the wrong type of index? I cleaned up that data and I''m rebuilding the >> index now. > > That sounds like a really good explanation to me.If only it were true :)>> Event.find(:all).select {|e| /^[0-9]+$/.match(e.city_for_sort)}=> []>> Event.find(:all).select {|e| /^[0-9\.]+$/.match(e.city_for_sort)}=> []>>but the problem still exists. According to http://ferret.davebalmain.com/trac/browser/trunk/c/src/sort.c, it looks like that should fix it. When I use the Sort and SortField I get the DRB error I reported previously because it can''t marshall those objects: Event.find_by_contents("marathon",:sort=>Ferret::Search::arch::SortField.new(:city_for_sort)]),:offset=>400).map(&:city_for_sort) DRb::DRbConnError: DRb::DRbServerNotFound Is there an easy fix for this?> > Jens >-- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Jul-13 15:29 UTC
[Ferret-talk] More sorting problems with untokenized index
On Fri, Jul 13, 2007 at 05:14:22PM +0200, Mike Mangino wrote:> Jens Kraemer wrote: > > On Fri, Jul 13, 2007 at 02:51:00PM +0200, Mike Mangino wrote: > >> Jens Kraemer wrote: > > > > [..] > > > >> > >> > >> There were some fields with the text "0". I wonder if it was guessing > >> the wrong type of index? I cleaned up that data and I''m rebuilding the > >> index now. > > > > That sounds like a really good explanation to me. > > If only it were true :) > > >> Event.find(:all).select {|e| /^[0-9]+$/.match(e.city_for_sort)} > => [] > >> Event.find(:all).select {|e| /^[0-9\.]+$/.match(e.city_for_sort)} > => [] > >> > > but the problem still exists. > > According to > http://ferret.davebalmain.com/trac/browser/trunk/c/src/sort.c, it looks > like that should fix it. > > When I use the Sort and SortField I get the DRB error I reported > previously because it can''t marshall those objects: > > Event.find_by_contents("marathon",:sort=>Ferret::Search::arch::SortField.new(:city_for_sort)]),:offset=>400).map(&:city_for_sort) > > DRb::DRbConnError: DRb::DRbServerNotFound > > > Is there an easy fix for this?Do you already use acts_as_ferret''s trunk? If yes and this problem still exists, your best bet is to extend local_index.rb and add a custom search method that constructs your sort objects based on additional parameters (that are no sort objects, to avoid the drb probs) you hand it over. This method will be then reachable via the ferret_index property of your model class. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Mike Mangino
2007-Jul-13 16:05 UTC
[Ferret-talk] More sorting problems with untokenized index
Jens Kraemer wrote:> On Fri, Jul 13, 2007 at 05:14:22PM +0200, Mike Mangino wrote: >> >> index now. >> >> >> DRb::DRbConnError: DRb::DRbServerNotFound >> >> >> Is there an easy fix for this? > > Do you already use acts_as_ferret''s trunk? > > If yes and this problem still exists, your best bet is to extend > local_index.rb and add a custom search method that constructs your sort > objects based on additional parameters (that are no sort objects, to > avoid the drb probs) you hand it over. This method will be then > reachable via the ferret_index property of your model class. >Okay. I did this previously. Here is my change: def find_id_by_contents(query, options = {}) if (sort=options[:sort]) sort = [sort] unless sort.is_a?(Array) sort_fields = sort.map do |field| term,direction,sort_type = field.split(/\s+/) direction ||= "asc" sort_type ||= "auto" Ferret::Search::SortField.new(term,:reverse=>direction.match(/desc/i),:type=>sort_type.to_sym) end options[:sort]=Ferret::Search::Sort.new(sort_fields) end ... That fixes the sort ordering and allows you to specify the type in the sort. I can roll that up into a patch if you would like that for inclusion. Mike> > Jens > > > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen Malessa-- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Jul-13 19:06 UTC
[Ferret-talk] More sorting problems with untokenized index
On Fri, Jul 13, 2007 at 06:05:30PM +0200, Mike Mangino wrote:> Jens Kraemer wrote: > > On Fri, Jul 13, 2007 at 05:14:22PM +0200, Mike Mangino wrote: > >> >> index now. > >> > >> > >> DRb::DRbConnError: DRb::DRbServerNotFound > >> > >> > >> Is there an easy fix for this? > > > > Do you already use acts_as_ferret''s trunk? > > > > If yes and this problem still exists, your best bet is to extend > > local_index.rb and add a custom search method that constructs your sort > > objects based on additional parameters (that are no sort objects, to > > avoid the drb probs) you hand it over. This method will be then > > reachable via the ferret_index property of your model class. > > > > Okay. I did this previously. Here is my change: > > def find_id_by_contents(query, options = {}) > if (sort=options[:sort]) > sort = [sort] unless sort.is_a?(Array) > sort_fields = sort.map do |field| > term,direction,sort_type = field.split(/\s+/) > direction ||= "asc" > sort_type ||= "auto" > Ferret::Search::SortField.new(term,:reverse=>direction.match(/desc/i),:type=>sort_type.to_sym) > end > options[:sort]=Ferret::Search::Sort.new(sort_fields) > end > ... > > That fixes the sort ordering and allows you to specify the type in the > sort. > > I can roll that up into a patch if you would like that for inclusion.yes, please post it to acts_as_ferret''s trac :-) thanks, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Mike Mangino
2007-Jul-13 19:27 UTC
[Ferret-talk] More sorting problems with untokenized index
> yes, please post it to acts_as_ferret''s trac :-) >http://projects.jkraemer.net/acts_as_ferret/ticket/155> > thanks, > Jens > > -- > Jens Kr?mer > webit! Gesellschaft f?r neue Medien mbH > Schnorrstra?e 76 | 01069 Dresden > Telefon +49 351 46766-0 | Telefax +49 351 46766-66 > kraemer at webit.de | www.webit.de > > Amtsgericht Dresden | HRB 15422 > GF Sven Haubold, Hagen MalessaMike -- Posted via http://www.ruby-forum.com/.