thr3ads.net - Ferret talk - [Ferret-talk] search results autocompletion [Oct 2006]

If this information is useful, please help other people find it:
Share via:

johan duflost

2006-Oct-05 05:58 UTC

[Ferret-talk] search results autocompletion

Dear list,

I ''m using a text input field with autocompletion . The suggestions
come
from a ferret index which is created by getting all the terms belonging to 
other indices. Here is the code:

class Suggestion

  attr_accessor :term

  def self.index(create)
    [Person, Project, Orgunit].each{|kl|
      terms = self.all_terms(kl)
      terms.each{|term|
        suggestion = Suggestion.new
        suggestion.term = term
        SUGGESTION_INDEX << suggestion.to_doc
      }
    }
    SUGGESTION_INDEX.optimize
  end

  def self.all_terms(klass)
    reader = Index::IndexReader.new(Object.const_get(klass.name.upcase + 
"_INDEX_DIR"))
    terms = []
    begin
    reader.field_names.each {|field_name|
    term_enum = reader.terms(field_name)
      begin
        term = term_enum.term()
        if !term.nil?
            if klass::SUGGESTIONABLE_FIELDS.include?(field_name)
              terms << term
            end
        end
      end while term_enum.next?
    }
    ensure
      reader.close
    end
    return terms
  end

  def to_doc
    doc = {}
    doc[:term] = self.term
    return doc
  end

end


It works very well except that the indexing process takes a long time. Does 
anybody knows if there''s a better way to do this?
Is there another way to get all the terms of an index?

Thank you.

Johan

Analyst Programmer
Belgian Biodiversity Platform ( http://www.biodiversity.be)
Belgian Federal Science Policy Office (http://www.belspo.be )
Tel:+32 2 650 5751 Fax: +32 2 650 5124

Jens Kraemer

2006-Oct-05 08:30 UTC

head link

[Ferret-talk] search results autocompletion

On Thu, Oct 05, 2006 at 07:58:40AM +0200, johan duflost
wrote:> 
> Dear list,
> 
> I ''m using a text input field with autocompletion . The
suggestions come
> from a ferret index which is created by getting all the terms belonging to 
> other indices. Here is the code:
> 
> class Suggestion
> 
>   attr_accessor :term
> 
>   def self.index(create)
>     [Person, Project, Orgunit].each{|kl|
>       terms = self.all_terms(kl)
>       terms.each{|term|
>         suggestion = Suggestion.new
>         suggestion.term = term
>         SUGGESTION_INDEX << suggestion.to_doc
>       }
>     }
>     SUGGESTION_INDEX.optimize
>   end
> 
>   def self.all_terms(klass)
>     reader = Index::IndexReader.new(Object.const_get(klass.name.upcase + 
> "_INDEX_DIR"))
>     terms = []
>     begin
>     reader.field_names.each {|field_name|
>     term_enum = reader.terms(field_name)
>       begin
>         term = term_enum.term()
>         if !term.nil?
>             if klass::SUGGESTIONABLE_FIELDS.include?(field_name)
>               terms << term
>             end
>         end
>       end while term_enum.next?
>     }
>     ensure
>       reader.close
>     end
>     return terms
>   end
> 
>   def to_doc
>     doc = {}
>     doc[:term] = self.term
>     return doc
>   end
> 
> end
> 
> 
> It works very well except that the indexing process takes a long time. Does
> anybody knows if there''s a better way to do this?
> Is there another way to get all the terms of an index?
Nothing ferret-related, but from the first look at it your code seems a
bit inefficient: you check the SUGGESTIONABLE_FIELDS array for each
term, instead of checking once and then going ahead. You even could just
iterate over the SUGGESTIONABLE_FIELDS array and use the field names
from there:

   def self.all_terms(klass)
     reader = Index::IndexReader.new(Object.const_get(klass.name.upcase + 
 "_INDEX_DIR"))
     terms = []
     begin
        klass::SUGGESTIONABLE_FIELDS.map { |field| 
          reader.terms(field) 
        }.each do |term_enum|
          # term_enum.term should not be nil, so no need to check this.
          terms << term_enum.term while term_enum.next?
        end
     ensure
       reader.close
     end
     return terms
   end
 
if your SUGGESTIONABLE_FIELDS contains fields not in the index (yet), the
reader.terms call might fail, in that case 
reader.terms(field) rescue nil
and compacting the result of map before calling each should work.

You further could save one iteration across all terms by yielding the
addition of the term to the index like this:

all_terms(klass) do |term|
  INDEX << { :term => term }
end

all_terms should do
yield term_enum.term while term_enum.next?
in the inner loop then. For extra style points rename all_terms to 
each_term :-)



cheers,
Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

johan duflost

2006-Oct-05 13:56 UTC

head link

[Ferret-talk] search results autocompletion - Checked by AntiVir DE

Jens,

You are right, my code was not efficient I agree with you.

The indices from which I create the suggestions index are not very big: 
80kb, 300kb and 2 Mb.

After 20 minutes, I get a suggestions index of 1400 kb approximately.

Thank you for your help,

Johan



----- Original Message ----- 
From: "Jens Kraemer" <kraemer at webit.de>
To: <ferret-talk at rubyforge.org>
Sent: Thursday, October 05, 2006 10:30 AM
Subject: Re: [Ferret-talk] search results autocompletion - Checked by 
AntiVir DE

> On Thu, Oct 05, 2006 at 07:58:40AM +0200, johan duflost wrote:
>>
>> Dear list,
>>
>> I ''m using a text input field with autocompletion . The
suggestions come
>> from a ferret index which is created by getting all the terms belonging
>> to
>> other indices. Here is the code:
>>
>> class Suggestion
>>
>>   attr_accessor :term
>>
>>   def self.index(create)
>>     [Person, Project, Orgunit].each{|kl|
>>       terms = self.all_terms(kl)
>>       terms.each{|term|
>>         suggestion = Suggestion.new
>>         suggestion.term = term
>>         SUGGESTION_INDEX << suggestion.to_doc
>>       }
>>     }
>>     SUGGESTION_INDEX.optimize
>>   end
>>
>>   def self.all_terms(klass)
>>     reader = Index::IndexReader.new(Object.const_get(klass.name.upcase
+
>> "_INDEX_DIR"))
>>     terms = []
>>     begin
>>     reader.field_names.each {|field_name|
>>     term_enum = reader.terms(field_name)
>>       begin
>>         term = term_enum.term()
>>         if !term.nil?
>>             if klass::SUGGESTIONABLE_FIELDS.include?(field_name)
>>               terms << term
>>             end
>>         end
>>       end while term_enum.next?
>>     }
>>     ensure
>>       reader.close
>>     end
>>     return terms
>>   end
>>
>>   def to_doc
>>     doc = {}
>>     doc[:term] = self.term
>>     return doc
>>   end
>>
>> end
>>
>>
>> It works very well except that the indexing process takes a long time. 
>> Does
>> anybody knows if there''s a better way to do this?
>> Is there another way to get all the terms of an index?
>
> Nothing ferret-related, but from the first look at it your code seems a
> bit inefficient: you check the SUGGESTIONABLE_FIELDS array for each
> term, instead of checking once and then going ahead. You even could just
> iterate over the SUGGESTIONABLE_FIELDS array and use the field names
> from there:
>
>   def self.all_terms(klass)
>     reader = Index::IndexReader.new(Object.const_get(klass.name.upcase +
> "_INDEX_DIR"))
>     terms = []
>     begin
>        klass::SUGGESTIONABLE_FIELDS.map { |field|
>          reader.terms(field)
>        }.each do |term_enum|
>          # term_enum.term should not be nil, so no need to check this.
>          terms << term_enum.term while term_enum.next?
>        end
>     ensure
>       reader.close
>     end
>     return terms
>   end
>
> if your SUGGESTIONABLE_FIELDS contains fields not in the index (yet), the
> reader.terms call might fail, in that case
> reader.terms(field) rescue nil
> and compacting the result of map before calling each should work.
>
> You further could save one iteration across all terms by yielding the
> addition of the term to the index like this:
>
> all_terms(klass) do |term|
>  INDEX << { :term => term }
> end
>
> all_terms should do
> yield term_enum.term while term_enum.next?
> in the inner loop then. For extra style points rename all_terms to
> each_term :-)
>
>
>
> cheers,
> Jens
>
> -- 
> webit! Gesellschaft f?r neue Medien mbH          www.webit.de
> Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
> Schnorrstra?e 76                         Tel +49 351 46766  0
> D-01069 Dresden                          Fax +49 351 46766 66
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Jens Kraemer

2006-Oct-06 08:23 UTC

head link

[Ferret-talk] search results autocompletion

On Thu, Oct 05, 2006 at 03:56:43PM +0200, johan duflost wrote:
[..]> 
> The indices from which I create the suggestions index are not very big: 
> 80kb, 300kb and 2 Mb.
> 
> After 20 minutes, I get a suggestions index of 1400 kb approximately.
still looks somewhat slow to me...

Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

johan duflost

2006-Oct-11 10:03 UTC

head link

[Ferret-talk] search results autocompletion - Checked by AntiVir DE

You''re right. In fact, I remove the terms''s accents before
indexing them.
Without this piece of code, it takes ''only'' 6 minutes.



----- Original Message ----- 
From: "Jens Kraemer" <kraemer at webit.de>
To: <ferret-talk at rubyforge.org>
Sent: Friday, October 06, 2006 10:23 AM
Subject: Re: [Ferret-talk] search results autocompletion - Checked by 
AntiVir DE

> On Thu, Oct 05, 2006 at 03:56:43PM +0200, johan duflost wrote:
> [..]
>>
>> The indices from which I create the suggestions index are not very big:
>> 80kb, 300kb and 2 Mb.
>>
>> After 20 minutes, I get a suggestions index of 1400 kb approximately.
>
> still looks somewhat slow to me...
>
> Jens
>
>
> -- 
> webit! Gesellschaft f?r neue Medien mbH          www.webit.de
> Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
> Schnorrstra?e 76                         Tel +49 351 46766  0
> D-01069 Dresden                          Fax +49 351 46766 66
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Maybe Matching Threads

Search for more maybe matching threads

Ferret talk - Oct 2006 - search results autocompletion

[Ferret-talk] search results autocompletion

[Ferret-talk] search results autocompletion

[Ferret-talk] search results autocompletion - Checked by AntiVir DE

[Ferret-talk] search results autocompletion

[Ferret-talk] search results autocompletion - Checked by AntiVir DE

Maybe Matching Threads