thr3ads.net - Ferret talk - [Ferret-talk] Grouping with ferret [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Henrik

2008-Jul-28 11:29 UTC

[Ferret-talk] Grouping with ferret

Hi list,

I have a problem grouping with ferret.

I''m using the filter_proc from Dave''s book as seen below

results = {}
group_by_proc = lambda do |doc_id, score, searcher|
	doc = searcher[doc_id]
		(results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path]
	next true
end


The problem is that if I use this it ignores my limit clause.

I set limit on 10 and I still get 5995 results and it takes several  
seconds.

How come the limit clause is ignored when using a filter_proc? How can  
I change this behaviour?


term = "wi"

bool1 = Ferret::Search::BooleanQuery.new()
bool1.add_query(Ferret::Search::PrefixQuery.new(:filename, "#{term}"))
bool1.add_query(Ferret::Search::PrefixQuery.new(:path, "#{term}"))

index.search(bool1, :limit => 10, :filter_proc => group_by_proc)

puts result.size
5995


Cheers,
Henke

Jens Kraemer

2008-Jul-29 09:04 UTC

head link

[Ferret-talk] Grouping with ferret

Hi!

On 28.07.2008, at 13:29, Henrik wrote:
> Hi list,
>
> I have a problem grouping with ferret.
>
> I''m using the filter_proc from Dave''s book as seen below
>
> results = {}
> group_by_proc = lambda do |doc_id, score, searcher|
> 	doc = searcher[doc_id]
> 		(results[doc[:pk_file_id]]||=[]) << doc[:filename] <<
doc[:path]
> 	next true
> end
>
>
> The problem is that if I use this it ignores my limit clause.
>
> I set limit on 10 and I still get 5995 results and it takes several  
> seconds.
>
> How come the limit clause is ignored when using a filter_proc? How  
> can I change this behaviour?
Filters are applied by Ferret before the result is limited, that''s why
your filter gets to see all possible results regardless of the limit  
you specify. If it was implemented the other way around, first  
limiting and then filtering, you would possibly end up with less than  
limit results in case your filter would actually filter out any  
results. Of course in your case this wouldnt happen as your filter  
does no filtering but always returns true.

If you really only want the first 10 results, why dont you just use  
the results you get back and do your result
collecting there like this?

results = {}
hit_count = index.search_each(query, :limit => 10, :filter_proc =>  
group_by_proc) do |doc, score|
   (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path]
end

You could of course also return false in your filter_proc for every  
possible hit once your results collection has reached the desired size  
to save the time collecting all results.

cheers,
Jens


--
Jens Kr?mer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

Ferret talk - Jul 2008 - Grouping with ferret

[Ferret-talk] Grouping with ferret

[Ferret-talk] Grouping with ferret