Hello all I''m using Ferret for a site wide search where I have several kinds of (similar) objects in a central index (using a "type" field containing the class name). This works great, and I can search all objects with one query. What I''d like to do now is to limit the results so that there will be a maximum of 10 (or 5 or whatever) results for each type.. I can''t figure out how to do this, so I thought maybe someone brighter than me has done this before or knows how to do it? :) Trent Steele -- Posted via http://www.ruby-forum.com/.
On 6/23/06, Trent Steele <tbone at horetore.com> wrote:> Hello all > > I''m using Ferret for a site wide search where I have several kinds of > (similar) objects in a central index (using a "type" field containing > the class name). This works great, and I can search all objects with one > query. > > What I''d like to do now is to limit the results so that there will be a > maximum of 10 (or 5 or whatever) results for each type.. I can''t figure > out how to do this, so I thought maybe someone brighter than me has done > this before or knows how to do it? :) > > Trent SteeleHi Trent, The way to do this is to search for more than you need and then actually go through each search result and count the types in a hash, only adding a doc if it''s type count is under the threshold. If you failed to retrieve enough results then search again and repeat until you get the required number of results. For those of you who know the Lucene API, this is where a Hits class comes in handy. It''ll be coming in a future version. For now I''ll show you the easiest wat by doing a search and setting :num_docs to max_doc, thereby getting all search results in one go; def get_results(search_str, max_type = 5, num_required = 10) type_counter = Hash.new(0) results = [] index.search_each(search_str, :num_docs => index.size) do |doc_id, score| doc = index[doc_id] if type_counter[doc[:type]] < max_type results << doc type_counter[doc[:type]] += 1 end break if results.size >= num_required end return results end Hope that helps, Dave
David Balmain wrote:> Hi Trent, > > The way to do this is to search for more than you need and then > actually go through each search result and count the types in a hash, > only adding a doc if it''s type count is under the threshold. If you > failed to retrieve enough results then search again and repeat until > you get the required number of results. For those of you who know the > Lucene API, this is where a Hits class comes in handy. It''ll be coming > in a future version. For now I''ll show you the easiest wat by doing a > search and setting :num_docs to max_doc, thereby getting all search > results in one go; > > def get_results(search_str, max_type = 5, num_required = 10) > type_counter = Hash.new(0) > results = [] > index.search_each(search_str, :num_docs => index.size) do > |doc_id, score| > doc = index[doc_id] > if type_counter[doc[:type]] < max_type > results << doc > type_counter[doc[:type]] += 1 > end > break if results.size >= num_required > end > return results > end > > Hope that helps, > DaveHi, I suspected I''d have to do something like this. Thanks for putting me on the right path. Are there any concerns about scalability/speed when the index grows larger regarding searching the whole index like this? T -- Posted via http://www.ruby-forum.com/.
On 6/27/06, Trent Steele <tbone at horetore.com> wrote:> David Balmain wrote: > > Hi Trent, > > > > The way to do this is to search for more than you need and then > > actually go through each search result and count the types in a hash, > > only adding a doc if it''s type count is under the threshold. If you > > failed to retrieve enough results then search again and repeat until > > you get the required number of results. For those of you who know the > > Lucene API, this is where a Hits class comes in handy. It''ll be coming > > in a future version. For now I''ll show you the easiest wat by doing a > > search and setting :num_docs to max_doc, thereby getting all search > > results in one go; > > > > def get_results(search_str, max_type = 5, num_required = 10) > > type_counter = Hash.new(0) > > results = [] > > index.search_each(search_str, :num_docs => index.size) do > > |doc_id, score| > > doc = index[doc_id] > > if type_counter[doc[:type]] < max_type > > results << doc > > type_counter[doc[:type]] += 1 > > end > > break if results.size >= num_required > > end > > return results > > end > > > > Hope that helps, > > Dave > > Hi, > > I suspected I''d have to do something like this. Thanks for putting me on > the right path. Are there any concerns about scalability/speed when the > index grows larger regarding searching the whole index like this?As long as you''re using the C backed version of Ferret, the index would have to grow very large before speed becomes a concern in this case. Note that Ferret actually has to go through every single search result anyway to check its score, no matter what you have num_docs set to. The only thing that you are using more of with a high value of num_docs is memory (approximately 12-bytes per hit). Cheers, Dave