Carl Youngblood
2005-Dec-16 01:05 UTC
[Ferret-talk] Ordering results by something other than relevance
Along with the contents of the documents in my index, I have stored the date they were added. I want to search for keywords in the index but have the results be sorted by their date rather than their relevance to the keywords. How would I do this in ferret? Thanks, Carl
David Balmain
2005-Dec-16 03:15 UTC
[Ferret-talk] Ordering results by something other than relevance
On 12/16/05, Carl Youngblood <carl at youngbloods.org> wrote:> Along with the contents of the documents in my index, I have stored > the date they were added. I want to search for keywords in the index > but have the results be sorted by their date rather than their > relevance to the keywords. How would I do this in ferret?Hi Carl, Good question. The easiest way to do this is to index the date a string, year first; include Ferret::Search include Ferret::Index data = [ {:content => "one", :date => "20051023"}, {:content => "two", :date => "19530315"}, {:content => "three", :date => "19390912"} ] index = Index.new(:analyzer => WhiteSpaceAnalyzer.new) data.each { |doc| index << doc } sf_date = SortField.new("date", {:sort_type => SortField::SortType::STRING}) top_docs = index.search("one", :sort => [sf_date, SortField::FIELD_SCORE]) SortField is from the Search module. Here we are sorting by string and then score if two dates are the same. If we want to reverse the sort; sf_date = SortField.new("date", {:sort_type => SortField::SortType::STRING :reverse => true}) There is also a module Ferret::Utils::DateTools which you can use to serialize your dates more efficiently but they won''t be human readable. Cheers, Dave
Erik Hatcher
2005-Dec-16 08:57 UTC
[Ferret-talk] Ordering results by something other than relevance
Dave, Wouldn''t sorting YYYYMMDD dates as an integer rather than a string use less resources in the cache? Erik On Dec 15, 2005, at 10:15 PM, David Balmain wrote:> On 12/16/05, Carl Youngblood <carl at youngbloods.org> wrote: >> Along with the contents of the documents in my index, I have stored >> the date they were added. I want to search for keywords in the index >> but have the results be sorted by their date rather than their >> relevance to the keywords. How would I do this in ferret? > > Hi Carl, > > Good question. The easiest way to do this is to index the date a > string, year first; > > include Ferret::Search > include Ferret::Index > > data = [ > {:content => "one", :date => "20051023"}, > {:content => "two", :date => "19530315"}, > {:content => "three", :date => "19390912"} > ] > index = Index.new(:analyzer => WhiteSpaceAnalyzer.new) > data.each { |doc| > index << doc > } > > sf_date = SortField.new("date", {:sort_type => > SortField::SortType::STRING}) > top_docs = index.search("one", :sort => [sf_date, > SortField::FIELD_SCORE]) > > SortField is from the Search module. Here we are sorting by string and > then score if two dates are the same. If we want to reverse the sort; > > sf_date = SortField.new("date", {:sort_type => > SortField::SortType::STRING > :reverse > => true}) > > There is also a module Ferret::Utils::DateTools which you can use to > serialize your dates more efficiently but they won''t be human > readable. > > Cheers, > Dave > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
David Balmain
2005-Dec-16 11:51 UTC
[Ferret-talk] Ordering results by something other than relevance
On 12/16/05, Erik Hatcher <erik at ehatchersolutions.com> wrote:> Dave, > > Wouldn''t sorting YYYYMMDD dates as an integer rather than a string > use less resources in the cache?Yes Erik, you are quite correct. Silly me. I think I need to get more sleep. Good thing you mentioned it because I found a bug in the integer and float sorts. They sort the opposite way to strings by default (largest first). I don''t know why I did this. I assumed it in all my unit tests too but now it doesn''t make any sense and it doesn''t seem to be that way in Lucene so I''ve decided to fix it and make another release. So now that I''ve thought about it a bit more, the easiest way to sort by date is like this; top_docs = index.search("one", :sort => Sort.new("date")) This is telling Ferret to sort by whatever it finds in the date column. Since it parses as an integer, it will sort by integer. Explicitly like this; sf_date = SortField.new("date", {:sort_type => SortField::SortType::INT}) top_docs = index.search("one", :sort => [sf_date, SortField::FIELD_SCORE]) That probably should be INTEGER. I''ll change that as well. Cheers, Dave
Carl Youngblood
2005-Dec-23 05:16 UTC
[Ferret-talk] Ordering results by something other than relevance
On 12/16/05, David Balmain <dbalmain.ml at gmail.com> wrote:> This is telling Ferret to sort by whatever it finds in the date > column. Since it parses as an integer, it will sort by integer. > Explicitly like this; > > sf_date = SortField.new("date", {:sort_type => SortField::SortType::INT}) > top_docs = index.search("one", :sort => [sf_date, SortField::FIELD_SCORE]) > > That probably should be INTEGER. I''ll change that as well.Another question: is it possible to sort by date first but if two documents have the same date, then sort them by relevance? Thanks, Carl
Erik Hatcher
2005-Dec-23 09:19 UTC
[Ferret-talk] Ordering results by something other than relevance
On Dec 23, 2005, at 12:16 AM, Carl Youngblood wrote:> On 12/16/05, David Balmain <dbalmain.ml at gmail.com> wrote: >> This is telling Ferret to sort by whatever it finds in the date >> column. Since it parses as an integer, it will sort by integer. >> Explicitly like this; >> >> sf_date = SortField.new("date", {:sort_type => >> SortField::SortType::INT}) >> top_docs = index.search("one", :sort => [sf_date, >> SortField::FIELD_SCORE]) >> >> That probably should be INTEGER. I''ll change that as well. > > Another question: is it possible to sort by date first but if two > documents have the same date, then sort them by relevance?That is exactly what Dave''s example will do. Providing a SortField array does multi-level sorting such that if the first criteria is equal the next criteria is used, and so on. It is also possible to specify ascending or descending for each SortField. Erik