Ian Zabel
2006-Aug-30 17:51 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
I''m trying to sort my search results by Date, in descending order. I''ve done quite a bit of reading through the forums here, and I''ve tried two different suggestions. This just returns results in the same order as a search without a sort: sort_fields = [] sort_fields << Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) Comment.find_by_contents("test", :sort => sort_fields, :num_docs => 5) This also doesn''t affect the order: Comment.find_by_contents("test", :sort => ["ferret_created_at"], :num_docs => 5) The following, however, DOES affect the order, but it''s SUPER slow: Ferret::Search::SortField.new("id",:reverse => :true) Sorting by id desc is really all I need, so if it''s easier to somehow quickly sort by that, all the better. Here''s my model: class Comment < ActiveRecord::Base acts_as_paranoid acts_as_ferret :fields => [ ''comment'', :forum_id, ''mod_type'', ''user_id'', ''ferret_created_at'' ] [...] def ferret_created_at created_at.strftime("%Y%m%d%H%M") end [...] end Any ideas as to what I''m doing wrong, or how to get this to work? Thanks! Ian. -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2006-Aug-30 21:57 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
Hi Ian, what Versions of aaf and Ferret do you use ? I''d suggest you try out aaf trunk and Ferret 0.10.1, there sorting seems to work (can''t say anything about speed besides 0.10 in general being faster). I didn''t ever use sorting with aaf, but afair some people on this list did, so we should get this working. I added a test case to aaf that sorts by :id (http://projects.jkraemer.net/acts_as_ferret/browser/trunk/demo/test/unit/content_test.rb line 94). If sorting by :id works and another field doesn''t, maybe it''s because of different field storage options (the docs suggest untokenized indexing for fields you want to sort by, which is true for :id by default, but false for other fields) Jens On Wed, Aug 30, 2006 at 07:51:18PM +0200, Ian Zabel wrote:> I''m trying to sort my search results by Date, in descending order. I''ve > done quite a bit of reading through the forums here, and I''ve tried two > different suggestions. > > This just returns results in the same order as a search without a sort: > sort_fields = [] > sort_fields << > Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) > Comment.find_by_contents("test", :sort => sort_fields, :num_docs => 5) > > This also doesn''t affect the order: > Comment.find_by_contents("test", :sort => ["ferret_created_at"], > :num_docs => 5) > > The following, however, DOES affect the order, but it''s SUPER slow: > Ferret::Search::SortField.new("id",:reverse => :true) > > Sorting by id desc is really all I need, so if it''s easier to somehow > quickly sort by that, all the better. > > Here''s my model: > class Comment < ActiveRecord::Base > acts_as_paranoid > acts_as_ferret :fields => [ ''comment'', :forum_id, ''mod_type'', > ''user_id'', ''ferret_created_at'' ] > [...] > def ferret_created_at > created_at.strftime("%Y%m%d%H%M") > end > [...] > end > > > Any ideas as to what I''m doing wrong, or how to get this to work? > Thanks! > Ian. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Ryan King
2006-Aug-31 20:21 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
To sort on a field, it *must* be stored as untokenized. When I''m sorting on dates, I actually convert them to epoch seconds, then sort on that integer. I''m not sure if this is really any faster than sort on strings, but I suspect it may be. -ryan On 8/30/06, Jens Kraemer <kraemer at webit.de> wrote:> Hi Ian, > > what Versions of aaf and Ferret do you use ? > > I''d suggest you try out aaf trunk and Ferret 0.10.1, there sorting seems > to work (can''t say anything about speed besides 0.10 in general being > faster). I didn''t ever use sorting with aaf, but afair some people on > this list did, so we should get this working. > > I added a test case to aaf that sorts by :id > (http://projects.jkraemer.net/acts_as_ferret/browser/trunk/demo/test/unit/content_test.rb line 94). > > If sorting by :id works and another field doesn''t, maybe it''s because of > different field storage options (the docs suggest untokenized indexing > for fields you want to sort by, which is true for :id by default, but > false for other fields) > > Jens > > On Wed, Aug 30, 2006 at 07:51:18PM +0200, Ian Zabel wrote: > > I''m trying to sort my search results by Date, in descending order. I''ve > > done quite a bit of reading through the forums here, and I''ve tried two > > different suggestions. > > > > This just returns results in the same order as a search without a sort: > > sort_fields = [] > > sort_fields << > > Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) > > Comment.find_by_contents("test", :sort => sort_fields, :num_docs => 5) > > > > This also doesn''t affect the order: > > Comment.find_by_contents("test", :sort => ["ferret_created_at"], > > :num_docs => 5) > > > > The following, however, DOES affect the order, but it''s SUPER slow: > > Ferret::Search::SortField.new("id",:reverse => :true) > > > > Sorting by id desc is really all I need, so if it''s easier to somehow > > quickly sort by that, all the better. > > > > Here''s my model: > > class Comment < ActiveRecord::Base > > acts_as_paranoid > > acts_as_ferret :fields => [ ''comment'', :forum_id, ''mod_type'', > > ''user_id'', ''ferret_created_at'' ] > > [...] > > def ferret_created_at > > created_at.strftime("%Y%m%d%H%M") > > end > > [...] > > end > > > > > > Any ideas as to what I''m doing wrong, or how to get this to work? > > Thanks! > > Ian. > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Ian Zabel
2006-Aug-31 22:02 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
Ryan King wrote:> To sort on a field, it *must* be stored as untokenized. > > When I''m sorting on dates, I actually convert them to epoch seconds, > then sort on that integer. I''m not sure if this is really any faster > than sort on strings, but I suspect it may be. > > -ryanThanks for the responses, guys. Jens, I''m using aaf trunk. I also have ferret 0.10.1 installed, so I''m assuming that aaf will use that instead of the 0.9 version that is also installed. I''m not sure why sorting by :id is so slow. It takes like 60 seconds or more to return a query sorted by id, and only like 0.5 seconds when not sorted. Weird. And, ryan, it looks like I''m not storing the ferret_created_at field as untokenized, so that must be my problem. I''ll have to make that untokenized and reindex (booooo). Thanks again Ian -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-02 03:17 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
On 9/1/06, Ian Zabel <contact at ezabel.com> wrote:> I''m not sure why sorting by :id is so slow. It takes like 60 seconds or > more to return a query sorted by id, and only like 0.5 seconds when not > sorted. Weird.Hi Ian, Try optimizing the index. Sorting results by a field will naturally take a little longer then sorting the results by relevancy because an index needs to be built for that field. Once the sort-index is built it is cached for the IndexReader so future sorts should be almost as fast getting unsorted results. To build the index Ferret needs to iterate through all the terms in the index. This takes significantly longer for unoptimized indexes. Here is a quick benchmark you can try running; require ''ferret'' include Ferret words = %w{one two three four five six seven eight nine ten} i = I.new start_time = Time.now 100000.times { i << {:id => rand(1000000), :content => words[rand(10)]}} puts "Building index took #{Time.new - start_time} seconds" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the second time" i.__send__(:ensure_writer_open) # get rid of sort cache start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the second time" puts "\nOPTIMIZING THE INDEX\n" start_time = Time.now i.optimize puts "Optimizing the index took #{Time.new - start_time} seconds" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the second time" i.__send__(:ensure_writer_open) # get rid of sort cache start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the second time" And here are the results on my system; Building index took 36.131648 seconds Sort by integer took 15.39588 seconds the first time Sort by integer took 0.002627 seconds the second time Sort by bytes took 15.889957 seconds the first time Sort by bytes took 0.001914 seconds the second time OPTIMIZING THE INDEX Optimizing the index took 0.639831 seconds Sort by integer took 0.170887 seconds the first time Sort by integer took 0.001423 seconds the second time Sort by bytes took 0.029054 seconds the first time Sort by bytes took 0.001424 seconds the second time So optimizing the index before sorting should help a lot. Cheers, Dave
Ian Zabel
2006-Sep-04 01:40 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
Thanks for all the help, everyone. I am now using this statement in my model: acts_as_ferret :fields => { ''comment'' => {}, :forum_id => {:index => :untokenized}, ''mod_type'' => {:index => :untokenized} , ''user_id'' => {:index => :untokenized} , ''ferret_created_at'' => {:index => :untokenized} } I rebuilt the index, and sorting now seems to work properly with both "ferret_created_at" and "id", like so sort_fields = [] sort_fields << Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) or sort_fields << Ferret::Search::SortField.new("id",:reverse => :true) Comment.find_by_contents("test", :sort => sort_fields, :limit => 5) Sorting by id is now MUCH faster, as well. The only thing I notice now is that the index is MUCH larger. The index is now about 91MB, whereas before I changed the aaf settings for the model, it was about 20MB. I guess untokenized values take up a lot more space? Thanks again! Ian. -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-04 04:25 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
On 9/4/06, Ian Zabel <contact at ezabel.com> wrote:> Thanks for all the help, everyone. > > I am now using this statement in my model: acts_as_ferret :fields => { > ''comment'' => {}, :forum_id => {:index => :untokenized}, ''mod_type'' => > {:index => :untokenized} , ''user_id'' => {:index => :untokenized} , > ''ferret_created_at'' => {:index => :untokenized} } > > I rebuilt the index, and sorting now seems to work properly with both > "ferret_created_at" and "id", like so > > sort_fields = [] > sort_fields << > Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) > or > sort_fields << Ferret::Search::SortField.new("id",:reverse => :true) > Comment.find_by_contents("test", :sort => sort_fields, :limit => 5) > > Sorting by id is now MUCH faster, as well.Great to hear.> The only thing I notice now is that the index is MUCH larger. The index > is now about 91MB, whereas before I changed the aaf settings for the > model, it was about 20MB. I guess untokenized values take up a lot more > space?That can be correct but it is surprising for your schema. For example, imagine the following six documents; "one two three" (13-bytes) "one three two" "two three one" "two one three" "three one two" "three two one" If you tokenized the fields you''d have tree terms "one" (3-bytes), "two" (3-bytes), "three" (5-bytes) and each term would use six bytes to store the doc_ids of the documents they occur in. So you''d have 3 + 3 + 5 + 3*6 = 29 bytes. Storing the fields as untokenized would take 13 bytes per field plus 1 byte to signify the document each field occurs in which would be (13 + 1) * 6 = 84 bytes. Of course this is a simplification of what is really going on. There is a lot of compression happening and a lot of other data is stored as well like term positions, term frequencies, term-vectors as well as actually storing the data. Now, if you want to save space, there are a few other parameters you can set. You can start by discarding :term_vectors. These are used for excerpts and match highlighting but are unnecessary in most cases. Also, there is no need to store all your data. Often, the only fields you''ll want to store are the model IDs. If you aren''t referencing the field in the document from the Ferret index, don''t bother storing it. So for example; :ferret_created_at could be :ferret_created_at => {:index => :untokenized, :store => :no, :term_vectors => :no} Note also I recommend always using Symbols for your field names rather than Strings. Cheers, Dave
Jens Kraemer
2006-Sep-04 07:25 UTC
[Ferret-talk] AAF Sorting by date - what am I doing wrong?
On Mon, Sep 04, 2006 at 01:25:52PM +0900, David Balmain wrote:> On 9/4/06, Ian Zabel <contact at ezabel.com> wrote:[..]> > :ferret_created_at => {:index => :untokenized, :store => :no, > :term_vectors => :no}:store => :no is already the default used by acts_as_ferret, no need to explicitly specify this. term vectors are stored by default :with_positions_offsets, so turning them off might help a bit. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66