thr3ads.net - Rails - ANN: acts_as

If this information is useful, please help other people find it:
Share via:

Kasper Weibel

2005-Dec-02 18:22 UTC

ANN: acts_as_ferret

Hi all

This week I have worked with Rails and Ferret to test Ferrets (and Lucenes)
capabilities. I decided to make a mixin for ActiveRecord as it seemed the
simplest possible solution and I ended up making this into a plugin.

For more info on Ferret see:
http://ferret.davebalmain.com/trac/

The plugin is functional but could easily be refined. Anyway I want to share it
with you. Regard it as a basic solution. Most of the ideas and code is taken
from these sources

Howtos and help on Ferret with Rails:
# http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails
# http://article.gmane.org/gmane.comp.lang.ruby.rails/26859
# http://ferret.davebalmain.com/trac
# http://aslakhellesoy.com/articles/2005/11/18/using-ferret-with-activerecord
# http://rubyforge.org/pipermail/ferret-talk/2005-November/000014.html

Howtos on creating plugins:
# http://wiki.rubyonrails.com/rails/pages/HowToWriteAnActsAsFoxPlugin
# http://www.jamis.jamisbuck.org/articles/2005/10/11/plugging-into-rails
# http://lesscode.org/2005/10/27/rails-simplest-plugin-manager/
# http://wiki.rubyonrails.com/rails/pages/HowTosPlugins


The result is the acts_as_ferret Mixin for ActivcRecord.

Use it as follows:
In any model.rb add acts_as_ferret

class Foo < ActiveRecord::Base
  acts_as_ferret 
end

All CRUD operations will be performed on both ActiveRecord (as usual) and a
ferret index for further searching.

The following method is available in your controllers:

ActiveRecord::find_by_contents(query) # Query is a string representing you query

The plugin follows the usual plugin structure and consists of 2 files: 

{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb

The Ferret DB is stored in:

{RAILS_ROOT}/db/index.db

Here follows the code:

# CODE for init.rb
require ''acts_as_ferret''
# END init.rb

# CODE for acts_as_ferret.rb
require ''active_record''
require ''ferret''

module FerretMixin #(was: Foo)
   module Acts #:nodoc:
      module ARFerret #:nodoc:          
         
         def self.append_features(base)
            super
            base.extend(MacroMethods)
         end
         
# declare the class level helper methods
# which will load the relevant instance methods defined below when invoked

         module MacroMethods
            
            def acts_as_ferret
               extend FerretMixin::Acts::ARFerret::ClassMethods
               class_eval do
                  include FerretMixin::Acts::ARFerret::ClassMethods            
               
                  after_create :ferret_create
                  after_update :ferret_update
                  after_destroy :ferret_destroy
               end
            end
            
         end
         
         module ClassMethods
            include Ferret
            
            INDEX_DIR = "#{RAILS_ROOT}/db/index.db" 
            
            def self.reloadable?; false end
            
            # Finds instances by file contents.
            def find_by_contents(query, options = {})    
               index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
               query_parser  
||QueryParser.new(index_searcher.reader.get_field_names.to_a)
               query = query_parser.parse(query)
               
               result = [] 
               index_searcher.search_each(query) do |doc, score|
                  id = index_searcher.reader.get_document(doc)["id"]
                  res = self.find(id)
                  result << res if res
               end
               index_searcher.close()
               result
            end
            
            # private
            
            def ferret_create
               index ||= Index::Index.new(:key => :id, 
                                       :path => INDEX_DIR, 
                                       :create_if_missing => true, 
                                       :default_field => "*") 
               index << self.to_doc
               index.optimize()
               index.close()
            end
            
            def ferret_update
               #code to update index
               index ||= Index::Index.new(:key => :id, 
                                       :path => INDEX_DIR, 
                                       :create_if_missing => true, 
                                       :default_field => "*") 
               index.delete(self.id.to_s)
               index << self.to_doc              
               index.optimize
               index.close()
            end
            
            def ferret_destroy
               # code to delete from index
               index ||= Index::Index.new(:key => :id, 
                                       :path => INDEX_DIR, 
                                       :create_if_missing => true, 
                                       :default_field => "*") 
               index_writer.delete(self.id.to_s)
               index_writer.optimize()
               index_writer.close()
            end
            
            def to_doc
# Churn through the complete Active Record and add it to the Ferret document
               doc = Ferret::Document::Document.new
               self.attributes.each_pair do |key,val| 
                  doc << Ferret::Document::Field.new(key, val.to_s,
Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::TOKENIZED)
               end
               doc
            end
         end         
      end
   end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it

ActiveRecord::Base.class_eval do
   include FerretMixin::Acts::ARFerret
end

# END acts_as_ferret.rb

Obie Fernandez

2005-Dec-02 19:13 UTC

head link

Re: ANN: acts_as_ferret

+1 great work

On 12/2/05, Kasper Weibel <weibel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> Hi all
>
> This week I have worked with Rails and Ferret to test Ferrets (and Lucenes)
> capabilities. I decided to make a mixin for ActiveRecord as it seemed the
> simplest possible solution and I ended up making this into a plugin.
>
> For more info on Ferret see:
> http://ferret.davebalmain.com/trac/
>
> The plugin is functional but could easily be refined. Anyway I want to
share it
> with you. Regard it as a basic solution. Most of the ideas and code is
taken
> from these sources
>
> Howtos and help on Ferret with Rails:
> # http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails
> # http://article.gmane.org/gmane.comp.lang.ruby.rails/26859
> # http://ferret.davebalmain.com/trac
> #
http://aslakhellesoy.com/articles/2005/11/18/using-ferret-with-activerecord
> # http://rubyforge.org/pipermail/ferret-talk/2005-November/000014.html
>
> Howtos on creating plugins:
> # http://wiki.rubyonrails.com/rails/pages/HowToWriteAnActsAsFoxPlugin
> # http://www.jamis.jamisbuck.org/articles/2005/10/11/plugging-into-rails
> # http://lesscode.org/2005/10/27/rails-simplest-plugin-manager/
> # http://wiki.rubyonrails.com/rails/pages/HowTosPlugins
>
>
> The result is the acts_as_ferret Mixin for ActivcRecord.
>
> Use it as follows:
> In any model.rb add acts_as_ferret
>
> class Foo < ActiveRecord::Base
>   acts_as_ferret
> end
>
> All CRUD operations will be performed on both ActiveRecord (as usual) and a
> ferret index for further searching.
>
> The following method is available in your controllers:
>
> ActiveRecord::find_by_contents(query) # Query is a string representing you
query
>
> The plugin follows the usual plugin structure and consists of 2 files:
>
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb
>
> The Ferret DB is stored in:
>
> {RAILS_ROOT}/db/index.db
>
> Here follows the code:
>
> # CODE for init.rb
> require ''acts_as_ferret''
> # END init.rb
>
> # CODE for acts_as_ferret.rb
> require ''active_record''
> require ''ferret''
>
> module FerretMixin #(was: Foo)
>    module Acts #:nodoc:
>       module ARFerret #:nodoc:
>
>          def self.append_features(base)
>             super
>             base.extend(MacroMethods)
>          end
>
> # declare the class level helper methods
> # which will load the relevant instance methods defined below when invoked
>
>          module MacroMethods
>
>             def acts_as_ferret
>                extend FerretMixin::Acts::ARFerret::ClassMethods
>                class_eval do
>                   include FerretMixin::Acts::ARFerret::ClassMethods
>
>                   after_create :ferret_create
>                   after_update :ferret_update
>                   after_destroy :ferret_destroy
>                end
>             end
>
>          end
>
>          module ClassMethods
>             include Ferret
>
>             INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
>
>             def self.reloadable?; false end
>
>             # Finds instances by file contents.
>             def find_by_contents(query, options = {})
>                index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
>                query_parser   ||>
QueryParser.new(index_searcher.reader.get_field_names.to_a)
>                query = query_parser.parse(query)
>
>                result = []
>                index_searcher.search_each(query) do |doc, score|
>                   id =
index_searcher.reader.get_document(doc)["id"]
>                   res = self.find(id)
>                   result << res if res
>                end
>                index_searcher.close()
>                result
>             end
>
>             # private
>
>             def ferret_create
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index << self.to_doc
>                index.optimize()
>                index.close()
>             end
>
>             def ferret_update
>                #code to update index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index.delete(self.id.to_s)
>                index << self.to_doc
>                index.optimize
>                index.close()
>             end
>
>             def ferret_destroy
>                # code to delete from index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index_writer.delete(self.id.to_s)
>                index_writer.optimize()
>                index_writer.close()
>             end
>
>             def to_doc
> # Churn through the complete Active Record and add it to the Ferret
document
>                doc = Ferret::Document::Document.new
>                self.attributes.each_pair do |key,val|
>                   doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::YES,
Ferret::Document::Field::Index::TOKENIZED)
>                end
>                doc
>             end
>          end
>       end
>    end
> end
>
> # reopen ActiveRecord and include all the above to make
> # them available to all our models if they want it
>
> ActiveRecord::Base.class_eval do
>    include FerretMixin::Acts::ARFerret
> end
>
> # END acts_as_ferret.rb
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Ezra Zygmuntowicz

2005-Dec-02 19:56 UTC

head link

Re: ANN: acts_as_ferret

Very nice Kasper-

	Thanks for sharing!

Cheers-
-Ezra
On Dec 2, 2005, at 10:22 AM, Kasper Weibel wrote:
> Hi all
>
> This week I have worked with Rails and Ferret to test Ferrets (and  
> Lucenes)
> capabilities. I decided to make a mixin for ActiveRecord as it  
> seemed the
> simplest possible solution and I ended up making this into a plugin.
>
> For more info on Ferret see:
> http://ferret.davebalmain.com/trac/
>
> The plugin is functional but could easily be refined. Anyway I want  
> to share it
> with you. Regard it as a basic solution. Most of the ideas and code  
> is taken
> from these sources
>
> Howtos and help on Ferret with Rails:
> # http://wiki.rubyonrails.com/rails/pages/ 
> HowToIntegrateFerretWithRails
> # http://article.gmane.org/gmane.comp.lang.ruby.rails/26859
> # http://ferret.davebalmain.com/trac
> # http://aslakhellesoy.com/articles/2005/11/18/using-ferret-with- 
> activerecord
> # http://rubyforge.org/pipermail/ferret-talk/2005-November/000014.html
>
> Howtos on creating plugins:
> # http://wiki.rubyonrails.com/rails/pages/HowToWriteAnActsAsFoxPlugin
> # http://www.jamis.jamisbuck.org/articles/2005/10/11/plugging-into- 
> rails
> # http://lesscode.org/2005/10/27/rails-simplest-plugin-manager/
> # http://wiki.rubyonrails.com/rails/pages/HowTosPlugins
>
>
> The result is the acts_as_ferret Mixin for ActivcRecord.
>
> Use it as follows:
> In any model.rb add acts_as_ferret
>
> class Foo < ActiveRecord::Base
>   acts_as_ferret
> end
>
> All CRUD operations will be performed on both ActiveRecord (as  
> usual) and a
> ferret index for further searching.
>
> The following method is available in your controllers:
>
> ActiveRecord::find_by_contents(query) # Query is a string  
> representing you query
>
> The plugin follows the usual plugin structure and consists of 2 files:
>
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb
>
> The Ferret DB is stored in:
>
> {RAILS_ROOT}/db/index.db
>
> Here follows the code:
>
> # CODE for init.rb
> require ''acts_as_ferret''
> # END init.rb
>
> # CODE for acts_as_ferret.rb
> require ''active_record''
> require ''ferret''
>
> module FerretMixin #(was: Foo)
>    module Acts #:nodoc:
>       module ARFerret #:nodoc:
>
>          def self.append_features(base)
>             super
>             base.extend(MacroMethods)
>          end
>
> # declare the class level helper methods
> # which will load the relevant instance methods defined below when  
> invoked
>
>          module MacroMethods
>
>             def acts_as_ferret
>                extend FerretMixin::Acts::ARFerret::ClassMethods
>                class_eval do
>                   include FerretMixin::Acts::ARFerret::ClassMethods
>
>                   after_create :ferret_create
>                   after_update :ferret_update
>                   after_destroy :ferret_destroy
>                end
>             end
>
>          end
>
>          module ClassMethods
>             include Ferret
>
>             INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
>
>             def self.reloadable?; false end
>
>             # Finds instances by file contents.
>             def find_by_contents(query, options = {})
>                index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
>                query_parser   ||>
QueryParser.new(index_searcher.reader.get_field_names.to_a)
>                query = query_parser.parse(query)
>
>                result = []
>                index_searcher.search_each(query) do |doc, score|
>                   id =
index_searcher.reader.get_document(doc)["id"]
>                   res = self.find(id)
>                   result << res if res
>                end
>                index_searcher.close()
>                result
>             end
>
>             # private
>
>             def ferret_create
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index << self.to_doc
>                index.optimize()
>                index.close()
>             end
>
>             def ferret_update
>                #code to update index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index.delete(self.id.to_s)
>                index << self.to_doc
>                index.optimize
>                index.close()
>             end
>
>             def ferret_destroy
>                # code to delete from index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index_writer.delete(self.id.to_s)
>                index_writer.optimize()
>                index_writer.close()
>             end
>
>             def to_doc
> # Churn through the complete Active Record and add it to the Ferret  
> document
>                doc = Ferret::Document::Document.new
>                self.attributes.each_pair do |key,val|
>                   doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::YES,  
> Ferret::Document::Field::Index::TOKENIZED)
>                end
>                doc
>             end
>          end
>       end
>    end
> end
>
> # reopen ActiveRecord and include all the above to make
> # them available to all our models if they want it
>
> ActiveRecord::Base.class_eval do
>    include FerretMixin::Acts::ARFerret
> end
>
> # END acts_as_ferret.rb
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
http://yakimaherald.com
509-577-7732
ezra-gdxLOakOTQ9oetBuM9ipNAC/G2K4zDHf@public.gmane.org

James R

2005-Dec-02 20:18 UTC

head link

Re: ANN: acts_as_ferret

Thanks... one problem. I beleive that I''m doing everything correctly 
except I keep getting this error on any CRUD operating:

undefined local variable or method `document'' for
#<Region:0xb7124c50>

(where #<Region:....> is the name of my model)


any ideas? The index is created and I''ve been able to test Ferret from
a
command line script just fine.

-- 
Posted via http://www.ruby-forum.com/.

Julian ''Julik'' Tarkhanov

2005-Dec-02 20:41 UTC

head link

Re: ANN: acts_as_ferret

On 2-dec-2005, at 19:22, Kasper Weibel wrote:
> Hi all
>
> This week I have worked with Rails and Ferret to test Ferrets (and  
> Lucenes)
> capabilities. I decided to make a mixin for ActiveRecord as it  
> seemed the
> simplest possible solution and I ended up making this into a plugin.
I recently finished a simple search plugin, which works like this

class Page < ActiveRecord::Base
	indexes_columns :title, :body, :into=>''somecolumn''
end

it''s here http://julik.textdriven.com/svn/tools/rails_plugins/ 
simple_search/ (just finished the tests)

Maybe we can join the two plugins and get a nice search hook for AR  
searching? Along the lines of

class Page < ActiveRecord::Base
    indexes_columns :title, :body, :into=>MainFerretIndex # if you  
pass a Ferret index it gets hooked instead of a column for LIKE
end

Or even maintain named Ferret indexes if the user has Ferret and  
resort to LIKE queries if he doesn''t?
--
Julian ''Julik'' Tarkhanov
me at julik.nl

Kasper Weibel

2005-Dec-02 22:59 UTC

head link

Re: ANN: acts_as_ferret

James R <adamjroth@...> writes:
> 
> Thanks... one problem. I beleive that I''m doing everything
correctly
> except I keep getting this error on any CRUD operating:
The following in acts_as_ferret.tb should be one line (almost at the end of the
file)

# Churn through the complete Active Record and add it to the Ferret document

Take care with those line breaks :-)

Kasper

David Balmain

2005-Dec-03 01:06 UTC

head link

Re: ANN: acts_as_ferret

Hi Kasper,

Nice work. Do you mind if I put this on the Ferret Wiki?

A few minor points. And a disclaimer, I haven''t had time to use Rails
since I started working on Ferret so I could be wrong about a few
things here. I noticed in ferret_destroy you have index_writer. I
think this is meant to be just index. Also, where you have the lines;

              index.optimize()
              index.close()

I would replace these with;

              index.flush()

Optimizing the index every time is not necessary and can be quite slow
for large indexes. Also, if you close the index, the next time you try
to use it you should get an error. I''m not sure why it works for you.
It might be a bug. I''ll have to check it out. Better to leave the
index open. If you are optimizing every time because you are really
concerned about search speed, it is better just to set the merge
factor to 2. ie;

               index ||= Index::Index.new(:key => :id,
                                      :path => INDEX_DIR,
                                      :merge_factor => 2)

Remember that there is generally a payoff between indexing speed and
search speed. Also note that I removed the :default_field and
:create_if_missing options. They were set to the defaults anyway.

Another thing, since you are setting the key to :id, there is no need
to do the delete when you do the update. This will happen
automatically.

Lastly, and most importantly, I think this will only work if you only
apply it to one object or you''ll get conflicting ids from two
different tables. To make this available to more than one object,
there are two solutions I can think of. You could have a separate
index directory for each object. Or you can set the key like this;

               index ||= Index::Index.new(:key => [:id, :table],
                                      :path => INDEX_DIR)

And your to_doc method would need to store the name of the table in
the :table field in the document.

I hope all this information helps. When I get some time to use Rails
I''ll post my own code.

Cheers,
Dave

PS: I just released Ferret 0.3.0 so gem update and enjoy. :)

On 12/3/05, Kasper Weibel <weibel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> Hi all
>
> This week I have worked with Rails and Ferret to test Ferrets (and Lucenes)
> capabilities. I decided to make a mixin for ActiveRecord as it seemed the
> simplest possible solution and I ended up making this into a plugin.
>
> For more info on Ferret see:
> http://ferret.davebalmain.com/trac/
>
> The plugin is functional but could easily be refined. Anyway I want to
share it
> with you. Regard it as a basic solution. Most of the ideas and code is
taken
> from these sources
>
> Howtos and help on Ferret with Rails:
> # http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails
> # http://article.gmane.org/gmane.comp.lang.ruby.rails/26859
> # http://ferret.davebalmain.com/trac
> #
http://aslakhellesoy.com/articles/2005/11/18/using-ferret-with-activerecord
> # http://rubyforge.org/pipermail/ferret-talk/2005-November/000014.html
>
> Howtos on creating plugins:
> # http://wiki.rubyonrails.com/rails/pages/HowToWriteAnActsAsFoxPlugin
> # http://www.jamis.jamisbuck.org/articles/2005/10/11/plugging-into-rails
> # http://lesscode.org/2005/10/27/rails-simplest-plugin-manager/
> # http://wiki.rubyonrails.com/rails/pages/HowTosPlugins
>
>
> The result is the acts_as_ferret Mixin for ActivcRecord.
>
> Use it as follows:
> In any model.rb add acts_as_ferret
>
> class Foo < ActiveRecord::Base
>   acts_as_ferret
> end
>
> All CRUD operations will be performed on both ActiveRecord (as usual) and a
> ferret index for further searching.
>
> The following method is available in your controllers:
>
> ActiveRecord::find_by_contents(query) # Query is a string representing you
query
>
> The plugin follows the usual plugin structure and consists of 2 files:
>
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
> {RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb
>
> The Ferret DB is stored in:
>
> {RAILS_ROOT}/db/index.db
>
> Here follows the code:
>
> # CODE for init.rb
> require ''acts_as_ferret''
> # END init.rb
>
> # CODE for acts_as_ferret.rb
> require ''active_record''
> require ''ferret''
>
> module FerretMixin #(was: Foo)
>    module Acts #:nodoc:
>       module ARFerret #:nodoc:
>
>          def self.append_features(base)
>             super
>             base.extend(MacroMethods)
>          end
>
> # declare the class level helper methods
> # which will load the relevant instance methods defined below when invoked
>
>          module MacroMethods
>
>             def acts_as_ferret
>                extend FerretMixin::Acts::ARFerret::ClassMethods
>                class_eval do
>                   include FerretMixin::Acts::ARFerret::ClassMethods
>
>                   after_create :ferret_create
>                   after_update :ferret_update
>                   after_destroy :ferret_destroy
>                end
>             end
>
>          end
>
>          module ClassMethods
>             include Ferret
>
>             INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
>
>             def self.reloadable?; false end
>
>             # Finds instances by file contents.
>             def find_by_contents(query, options = {})
>                index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
>                query_parser   ||>
QueryParser.new(index_searcher.reader.get_field_names.to_a)
>                query = query_parser.parse(query)
>
>                result = []
>                index_searcher.search_each(query) do |doc, score|
>                   id =
index_searcher.reader.get_document(doc)["id"]
>                   res = self.find(id)
>                   result << res if res
>                end
>                index_searcher.close()
>                result
>             end
>
>             # private
>
>             def ferret_create
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index << self.to_doc
>                index.optimize()
>                index.close()
>             end
>
>             def ferret_update
>                #code to update index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index.delete(self.id.to_s)
>                index << self.to_doc
>                index.optimize
>                index.close()
>             end
>
>             def ferret_destroy
>                # code to delete from index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index_writer.delete(self.id.to_s)
>                index_writer.optimize()
>                index_writer.close()
>             end
>
>             def to_doc
> # Churn through the complete Active Record and add it to the Ferret
document
>                doc = Ferret::Document::Document.new
>                self.attributes.each_pair do |key,val|
>                   doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::YES,
Ferret::Document::Field::Index::TOKENIZED)
>                end
>                doc
>             end
>          end
>       end
>    end
> end
>
> # reopen ActiveRecord and include all the above to make
> # them available to all our models if they want it
>
> ActiveRecord::Base.class_eval do
>    include FerretMixin::Acts::ARFerret
> end
>
> # END acts_as_ferret.rb
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Kasper Weibel

2005-Dec-03 02:21 UTC

head link

Re: ANN: acts_as_ferret

David Balmain <dbalmain.ml@...> writes:
> 
> Hi Kasper,
> 
> Nice work. Do you mind if I put this on the Ferret Wiki?
Thanks David

This is really quality input!

It''s my first week with Ferret and I''m still working my way
into it. I hope I''ll
get time to reflect on your comments before monday.

Feel free to put it on the wiki!

Kasper

Erik Hatcher

2005-Dec-03 11:14 UTC

head link

Re: ANN: acts_as_ferret

CC''ing ferret-talk also.

Nice work, Kasper!   You''ve beaten me to it - this was something I  
was planning on tackling in the near future.

I''ve got some additional feedback for you inlined below.  Keep in  
mind that I''m being highly detailed in my feedback, in order to help  
this extension become the best it can be given Lucene best  
practices.  Your work is a great start, and I want to see this  
evolve.  All comments below are constructive, not even
''criticism''.
Thanks for getting this started!

On Dec 2, 2005, at 1:22 PM, Kasper Weibel wrote:> The result is the acts_as_ferret Mixin for ActivcRecord.
>
> Use it as follows:
> In any model.rb add acts_as_ferret
>
> class Foo < ActiveRecord::Base
>   acts_as_ferret
> end
Ideally there will be many options desired besides just enabling a  
table to be indexed fully.  More on that in a moment.
> All CRUD operations will be performed on both ActiveRecord (as  
> usual) and a
> ferret index for further searching.
The toughest issue to deal with here is transactions.  Suppose a  
database operation rolls back - then what happens to the index?  It''s  
out of sync.  I don''t have any easy solutions though, and it is an  
issue that pops up regularly in the Java Lucene community as well.   
There is quite a mismatch between a relational database and a full- 
text index when it comes to how updates and additions are handled.

At the very least, a warning should be included mentioning the  
transactional issue.

Another facility that is desirable with Lucene is the ability to  
rebuild the entire index from scratch.  Why?  Perhaps you change the  
analyzer, you will need to re-index all documents to have them re- 
analyzed.
> The following method is available in your controllers:
>
> ActiveRecord::find_by_contents(query) # Query is a string  
> representing you query
Dave mentioned this, but you''re currently only indexing "id",
but not
the table name.  Thus you could get documents that matching the query  
from other tables, and get an id that doesn''t exist for the current  
table or one from a different table.  Table name needs to be  
considered somehow, either by building a separate index for each  
table, or adding the table name as an indexed, untokenized field.
> The Ferret DB is stored in:
>
> {RAILS_ROOT}/db/index.db
Please consider NOT calling it a "DB".  Ferret is Lucene.  What it  
builds is an "index", not a "database" in the traditional
sense.  I
think it would be best to avoid "db" terminology to prevent confusion.
>          module ClassMethods
>             include Ferret
>
>             INDEX_DIR = "#{RAILS_ROOT}/db/index.db"
I''m not sure how to parameterize "acts_as" extensions, but
making the
index location more configurable would be good.
>             # Finds instances by file contents.
>             def find_by_contents(query, options = {})
>                index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
>                query_parser   ||>
QueryParser.new(index_searcher.reader.get_field_names.to_a)
>                query = query_parser.parse(query)
QueryParser is only one (and often crude) way to formulate a Query.   
Ideally there would be a couple of methods to search with, one that  
takes a QueryParser-friendly expression like "foo AND bar NOT baz"  
and another that takes a Query instance allowing a developer to  
formulate sophisticated queries via the Ferret query API rather than  
parsing an expression.   There are many good reasons for this, most  
importantly from a user interface perspective where the application  
makes more sense to have separate fields that build up a query rather  
than the one totally free-form Google-esque text box.  Many  
applications need full-text search, but not in a way that users need  
to know query expression operators like +/-/AND/OR.

Back to the table name issue, here you''ll want to wrap the query with  
a BooleanQuery AND''d with a TermQuery for table:<table name> so
that
you''re sure the only hits returned will be for the current table.
>                result = []
>                index_searcher.search_each(query) do |doc, score|
>                   id =
index_searcher.reader.get_document(doc)["id"]
>                   res = self.find(id)
>                   result << res if res
>                end
Some handling of paging needs to be added here.  It is unlikely that  
all hits are needed, and accessing the Document for every hit will be  
an enormous performance bottle-neck with lots of data.  It is very  
important to choose the hits enumeration carefully.  Doing a database  
query for every hit is also likely to be a huge bottleneck.  Perhaps  
doing a SQL "IN" query for all id''s after the narrowing the
set of
hits (by page) is feasible, though I''m not sure what limits exist on  
how many items you can have with an "IN" clause.  I''ve not
delved
into Ferret in much depth yet, but in Java Lucene a HitCollector  
would possibly be a good way to handle this.
>                index_searcher.close()
>                result
>             end
It is definitely unwise to close the IndexSearcher instance for every  
search - leaving it open allows for field caches to warm  up and  
speeds up successive searches.
>             # private
>
>             def ferret_create
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
Dave mentioned the key thing, and I''ll reiterate the need to add the  
table name to it.
>                index << self.to_doc
>                index.optimize()
>                index.close()
>             end
Reiterating Dave, but just to be thorough, optimizing and closing an  
index is not a good thing to do on every document operation as it can  
be slow.  And definitely heed his advice about using flush.  There  
does need to be a facility to optimize the index on demand, which  
developers may choose to do as a nightly batch process, or  
periodically as the index becomes segmented.
>             def ferret_update
>                #code to update index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
I recommend centralizing the Index constructor, so as to not  
duplicate all of those parameters and allowing them to be changed in  
one spot.
>                index.delete(self.id.to_s)
>                index << self.to_doc
>                index.optimize
>                index.close()
>             end
>
>             def ferret_destroy
>                # code to delete from index
>                index ||= Index::Index.new(:key => :id,
>                                        :path => INDEX_DIR,
>                                        :create_if_missing => true,
>                                        :default_field => "*")
>                index_writer.delete(self.id.to_s)
>                index_writer.optimize()
>                index_writer.close()
>             end
Again, the table name should be part of the key for all operations  
above.
>             def to_doc
> # Churn through the complete Active Record and add it to the Ferret  
> document
>                doc = Ferret::Document::Document.new
>                self.attributes.each_pair do |key,val|
>                   doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::YES,  
> Ferret::Document::Field::Index::TOKENIZED)
>                end
>                doc
>             end
This to_doc is where a lot of fun can be had.  There are many options  
that need to be parameterized by the developer at the model level.   
For example, how a field is indexed is crucial.  You''re storing and  
tokenizing every field, including the "id" field.  You definitely do  
not want to tokenize the "id" field.  Adding the table name is needed
also, untokenized.  Each field should allow flexibility on how it is  
(or is not) indexed, including whether to store/tokenize the field or  
not.  Storing fields is unnecessary in the ActiveRecord sense, since  
what you''re returning from the search method are records from the  
database, not documents from the index.  Making the analyzer  
controllable is necessary at a global level for the index, and  
overridable on a per-field level too.

A common technique with Lucene when field-level searching granularity  
is not relevant is to create an aggregate field, say "contents" where
all text is indexed.  With Ferret, you could do this by iterating  
over all fields that should be indexed/tokenized using the "contents"
as the field name for all fields of the record.  Then searches would  
occur only against "contents".  While Dave likes the default field to
be "*", I personally find distributing a query expression across all  
fields tricky and error-prone, especially given that different fields  
may be analyzed differently.  Consider a query for "foo bar".  With  
two fields "title" and "body", how do you expand that query
across
all fields?  Not trivial.  This is why I like the aggregate  
"contents" field technique, which can work in conjunction with fields
indexed individually also, so a query for "foo bar" would search the  
"contents" field by default, but someone could do "title:foo  
body:bar" to refine things.

I think this is enough, and perhaps too much(!), feedback for  
now :)   Sorry if it seems overly picky, but I think this is a very  
important addition to the Rails and ActiveRecord.  The magic that is  
Lucene is very special, with I''m thrilled that it has now entered the  
Ruby world.  I want to help Ferret and its integration into places  
like ActiveRecord goes as smoothly as possible and keeps the  
outstanding reputation that Lucene has in the Java (and C# and  
Python, etc) world.  There are many ways to use Lucene inefficiently  
- I''ll be here doing what I can to help oversee that things are done  
in the best possible way.

	Erik

David Balmain

2005-Dec-03 12:00 UTC

head link

[Ferret-talk] [Rails] ANN: acts_as_ferret

Thanks for the feedback Erik. I''ve actually posted the acts_as_ferret
code on the Ferret wiki with a few improvements. But it''s far from
optimal. Please add improvements or post your ideas here;

http://ferret.davebalmain.com/trac/wiki/FerretOnRails

Hopefully with Eriks feedback and a few Rails gurus looking over it
we''ll soon have a really nice solution to Rails Ferret integration.
> While Dave likes the default field to
> be "*", I personally find distributing a query expression across
all
> fields tricky and error-prone, especially given that different fields
> may be analyzed differently.
Just to defend my honour :-P I actually totally agree with Erik here.
Think of the default field "*" as like Rails scaffolding.
It''s handy
to get you started but you''ll have to put a bit of work and thought
into it yourself to get the most out of Ferret.

Cheers,
Dave

Thomas Lockney

2005-Dec-05 04:39 UTC

head link

Re: ANN: acts_as_ferret

great job on this Kasper. I took a look at this a few days ago and started
playing with it this weekend. I''ve taken a few of Erik''s
suggestions and started
trying to implement them. I don''t know if you''ve already
started working on
enhancing it, but I''d be very interested in contributing my changes.
It''ll
probably be a few days before I can get back in and finish things up, though.
(The Portland Ruby Brigade has their monthly meeting on Tuesday, so
that''s one
nights work missed. 
;~) 

Here''s the changes I''ve started working on:

1. Adding configuration 

    The notation I''m working on is something like this:
    
        acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/", fields
=> {...}

    Still playing with the configuration of the fields. I''ve also
written it so
that the default is to index all fields with the default settings. In addition,
it should be possible to simply pass an array to the fields parameter and
default the settings for Storable, etc.

2. Adding the ability to pass Query objects to the find_by_contents method.

I''ve been doing some refactoring along the way, too, and hope to add
some unit
tests eventually. One final suggestion, perhaps the name should be changed to
acts_as_indexed?

Anyway, this is great work. I hope I can make worthwhile contributions to this.

--
Thomas Lockney

David Balmain

2005-Dec-05 05:13 UTC

head link

Re: Re: ANN: acts_as_ferret

Hi Thomas,

For additionial ideas look here;

http://ferret.davebalmain.com/trac/wiki/FerretOnRails

And of course, please feel free to add your improvements.

Cheers,
Dave

On 12/5/05, Thomas Lockney
<tlockney-SQzT33pxqo1BDgjK7y7TUQ@public.gmane.org>
wrote:> great job on this Kasper. I took a look at this a few days ago and started
> playing with it this weekend. I''ve taken a few of Erik''s
suggestions and started
> trying to implement them. I don''t know if you''ve already
started working on
> enhancing it, but I''d be very interested in contributing my
changes. It''ll
> probably be a few days before I can get back in and finish things up,
though.
> (The Portland Ruby Brigade has their monthly meeting on Tuesday, so
that''s one
> nights work missed.
> ;~)
>
> Here''s the changes I''ve started working on:
>
> 1. Adding configuration
>
>     The notation I''m working on is something like this:
>
>         acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/",
fields => {...}
>
>     Still playing with the configuration of the fields. I''ve also
written it so
> that the default is to index all fields with the default settings. In
addition,
> it should be possible to simply pass an array to the fields parameter and
> default the settings for Storable, etc.
>
> 2. Adding the ability to pass Query objects to the find_by_contents method.
>
> I''ve been doing some refactoring along the way, too, and hope to
add some unit
> tests eventually. One final suggestion, perhaps the name should be changed
to
> acts_as_indexed?
>
> Anyway, this is great work. I hope I can make worthwhile contributions to
this.
>
> --
> Thomas Lockney
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Erik Hatcher

2005-Dec-05 09:36 UTC

head link

Re: Re: ANN: acts_as_ferret

On Dec 4, 2005, at 11:39 PM, Thomas Lockney wrote:> (The Portland Ruby Brigade has their monthly meeting on Tuesday, so  
> that''s one
> nights work missed.
> ;~)
You Portland Rubyists really know how to party!   I went to the event  
during OSCON in August - what a blast.
> 1. Adding configuration
>
>     The notation I''m working on is something like this:
>
>         acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/",
fields
> => {...}
So you''re thinking that each model may have its own index?   I
wasn''t
sure if one index per model made sense or whether a single index,  
globally configured through environment.rb and friends, made the most  
sense.  Using one index would allow some future clever things such as  
querying without the table name allowing results to come back with  
objects spanning multiple models.

I''m leaning towards preferring a single index, such that  
the :index_dir configuration would be done via environments.rb  
globally, not per model.
> 2. Adding the ability to pass Query objects to the find_by_contents  
> method.
Cool.  Maybe this should be renamed to find_by_ferret?  If a String  
is passed in, it gets parsed (with the options hash allowing control  
over the parsing), and if a Query is passed in then it is used as-is.
> I''ve been doing some refactoring along the way, too, and hope to  
> add some unit
> tests eventually. One final suggestion, perhaps the name should be  
> changed to
> acts_as_indexed?
I like it being acts_as_ferret personally.  "indexed" is overloaded  
within the relational database domain, so it could be construed as  
having to do with DB indexes.
> Anyway, this is great work. I hope I can make worthwhile  
> contributions to this.
Thanks for your efforts!   I''m glad to see this all coming together.

	Erik

Kasper Weibel

2005-Dec-05 11:00 UTC

head link

Re: ANN: acts_as_ferret

Hi all

First of all I''d like to take the oppertunity to thank you all for the
great
response. Personally I feel that this approach to Ferret/Rails integration will
be a good thing to investigate further. People need quality search.

I think that we should agree on where to put the input for this project. The
page on David Balmains wiki is a good start - thanks for that David.
http://ferret.davebalmain.com/trac/wiki/FerretOnRails

I needed this code for a specific task on my job and there is still many things
to do to make it general usable.

I will comment on different peoples input below.

Thanks to David for giving direct input for enhancing the quality of the code
and explaining index.flush() to me. It''s good to have the author of
ferret
giving direct input as I''m not really sure where the pitfalls in the
implementation are speed/quality wise.

As both David and Eric Hatcher has pointed out the current implementation will
only index one model per application. My view on this issue is that I would like
to have one index for all models as opposed to multiple index files; that is ONE
Ferret index per application.

I will also need to implement a method for rebuilding the index. This will come
in handy both when in development mode and probably also in production.

Eric pointed out that there will be problems with transactions and I must admit
that I don''t have any viable ideas of how to approach this issue. I
have thought
of turning transactions off for the SQL tables in question - if that''s
possible
at all.

Eric also had problems with the name index.db. Instead I suggest index.frt

The current search method should be worked on. At the moment it fires quite a
few SQL select statements. There is also a need for the implementation of
pagination.

The to_doc method is one way to approach things when building the index. I
actually thought of Erics suggestion about an aggregate field which sounds
practical. There should be a way of configuring which fields goes where.

I have had many ideas of what other things to implement. One of them is that
hard core Lucene folks will probably not put up with the limitations of a
specific implementation if it makes things difficult. One of the things I like
about Active Recored in Rails is the find_by_sql() method which lets you do
whatever you want on the SQL side. A similar approach could be implemented with
Ferret. find_by_fql() - if there is such a term as Ferret Query Language.

Also the many possibilities for fine tuning should not be forgotten in favour of
simplicity. There should allways be a way to make the configuration exactly as
you would like it. I favour the configuration approach Thomas Lockney has
suggested.  

Lastly: I really appreciate your contributions and I feel that with our combined
efforts it will be possible to build a quality solution. In time acts_as_ferret
could become the prefered choice for Ferret/Rails integration.

Kasper

Kasper Weibel Nielsen-Refs

2005-Dec-05 12:23 UTC

head link

[Ferret-talk] [Rails] Re: ANN: acts_as_ferret

Hi all

First of all I''d like to take the oppertunity to thank you all for the
great
response. Personally I feel that this approach to Ferret/Rails integration will
be a good thing to investigate further. People need quality search.

I think that we should agree on where to put the input for this project. The
page on David Balmains wiki is a good start - thanks for that David.
http://ferret.davebalmain.com/trac/wiki/FerretOnRails

I needed this code for a specific task on my job and there is still many things
to do to make it general usable.

I will comment on different peoples input below.

Thanks to David for giving direct input for enhancing the quality of the code
and explaining index.flush() to me. It''s good to have the author of
ferret
giving direct input as I''m not really sure where the pitfalls in the
implementation are speed/quality wise.

As both David and Eric Hatcher has pointed out the current implementation will
only index one model per application. My view on this issue is that I would like
to have one index for all models as opposed to multiple index files; that is ONE
Ferret index per application.

I will also need to implement a method for rebuilding the index. This will come
in handy both when in development mode and probably also in production.

Eric pointed out that there will be problems with transactions and I must admit
that I don''t have any viable ideas of how to approach this issue. I
have thought
of turning transactions off for the SQL tables in question - if that''s
possible
at all.

Eric also had problems with the name index.db. Instead I suggest index.frt

The current search method should be worked on. At the moment it fires quite a
few SQL select statements. There is also a need for the implementation of
pagination.

The to_doc method is one way to approach things when building the index. I
actually thought of Erics suggestion about an aggregate field which sounds
practical. There should be a way of configuring which fields goes where.

I have had many ideas of what other things to implement. One of them is that
hard core Lucene folks will probably not put up with the limitations of a
specific implementation if it makes things difficult. One of the things I like
about Active Recored in Rails is the find_by_sql() method which lets you do
whatever you want on the SQL side. A similar approach could be implemented with
Ferret. find_by_fql() - if there is such a term as Ferret Query Language.

Also the many possibilities for fine tuning should not be forgotten in favour of
simplicity. There should allways be a way to make the configuration exactly as
you would like it. I favour the configuration approach Thomas Lockney has
suggested.

Lastly: I really appreciate your contributions and I feel that with our combined
efforts it will be possible to build a quality solution. In time acts_as_ferret
could become the prefered choice for Ferret/Rails integration.

Kasper

Thomas Lockney

2005-Dec-05 16:15 UTC

head link

Re: ANN: acts_as_ferret

Erik Hatcher <erik@...> writes:
> 
> On Dec 4, 2005, at 11:39 PM, Thomas Lockney wrote:
> > (The Portland Ruby Brigade has their monthly meeting on Tuesday, so  
> > that''s one
> > nights work missed.
> > ;~)
> 
> You Portland Rubyists really know how to party!   I went to the event  
> during OSCON in August - what a blast.
Well, that was my first PRX.rb event since I had just moved here, so I
can''t
take credit for all that...
> >     The notation I''m working on is something like this:
> >
> >         acts_as_ferret :index_dir =>
"#{RAILS_ROOT}/index/", fields
> > => {...}
> 
> So you''re thinking that each model may have its own index?   
Actually, I guess I didn''t indicate very well what was going to be
optional
configuration and what was fixed. I only put that there to indicate that you
*could* have one index per model. I left out the part that would allow you to
configure it globaly. I tend to agree with you, in fact, that one global index
makes the most sense.
> 
> > 2. Adding the ability to pass Query objects to the find_by_contents  
> > method.
> 
> Cool.  Maybe this should be renamed to find_by_ferret?  
sounds reasonable to me.
> If a String is passed in, it gets parsed (with the options hash allowing 
> control over the parsing), and if a Query is passed in then it is used 
> as-is.
That''s pretty much what I was aiming for.
> I like it being acts_as_ferret personally.  "indexed" is
overloaded
> within the relational database domain, so it could be construed as  
> having to do with DB indexes.
Seems reasonable to me. 

Thomas

Thomas Lockney

2005-Dec-14 00:40 UTC

head link

Re: ANN: acts_as_ferret

Since it''s been over a week and I''ve only had time to tinker
here and there on
my proposed changes to the acts_as_ferret plugin, I thought it was time to just
post what I had so far and let others weigh in on it or take their own stab at
making it more complete. I''ve posted my updated version along with some
brief
notes at the bottom of the ferret wiki page here:
http://ferret.davebalmain.com/trac/wiki/FerretOnRails

I''m still actively working on this, but I''ve only been able to
do it in fits and
spurts so far. I appologize for the ugliness of some of the code, I''m
still
trying to figure out how to do all the dynamic "magic" necessary for
this sort
of thing.

David Balmain

2005-Dec-14 02:28 UTC

head link

Re: Re: ANN: acts_as_ferret

Great work Thomas,

I just notices two things in my quick glance. Firstly, you need to
change Document::Field::Index::NO to
Document::Field::Index::UNTOKENIZED for the :ferret_class and :id
fields. My fault as I made the same mistake in my code above.

Also, I don''t know if you meant to use symbols but you
shouldn''t use
'':'' in a field name as it will through off the query parser.
Get rid
of the ''"'' around :ferret_class and :id and
you''ll be fine.

I made both these changes on the wiki already.

One other change you may like to make is to allow Query objects to be
passed to the find_by_contents method as well as Strings, but I''ll
leave that one up to you for the moment.

Hope that helps,
Dave

On 12/14/05, Thomas Lockney
<tlockney-SQzT33pxqo1BDgjK7y7TUQ@public.gmane.org>
wrote:> Since it''s been over a week and I''ve only had time to
tinker here and there on
> my proposed changes to the acts_as_ferret plugin, I thought it was time to
just
> post what I had so far and let others weigh in on it or take their own stab
at
> making it more complete. I''ve posted my updated version along with
some brief
> notes at the bottom of the ferret wiki page here:
> http://ferret.davebalmain.com/trac/wiki/FerretOnRails
>
> I''m still actively working on this, but I''ve only been
able to do it in fits and
> spurts so far. I appologize for the ugliness of some of the code,
I''m still
> trying to figure out how to do all the dynamic "magic" necessary
for this sort
> of thing.
>
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

jennyw

2005-Dec-14 06:03 UTC

head link

Re: Re: ANN: acts_as_ferret

It''s so great that people are working on this! Ferret is great and I 
look forward to seeing it better integrated with Rails.

Thomas -- I tried this code but experienced a few problems with it. I 
never got it to work, and gave up since it''s not exaclty what I need 
(the documents I''m storing in Ferret don''t exactly match my
model
objects, but are a composite of them). Still, I have some feedback that 
might (or might not) be helpful.

In addition to what David mentioned, I noticed that you use the method 
class_variable_set in the method acts_as_ferret. This isn''t available
in
Ruby 1.8.2. Moreover, I''m not sure why you''re using this here
since the
variable names are not dynamic. I just changed these to:

            @@fields_for_ferret = Array.new   
            @@class_index_dir = configuration[:index_dir]

Also, I noticed that the indentation on the class method append_features 
was a bit off ... it looked like super was the beginning of a block. 
Just a minor thing.

Also, I''m confused about the name for the SingletonMethods module. What
is the singleton that''s being referred to here? This isn''t a
criticism
-- I''m just confused, since it seems to me that these methods get added
to your model classes and are available to each instance. Are they named 
such because each model has a single instance of the index?

Also, I was wondering -- since ferret_create is aliased as 
ferret_update, shouldn''t it first call a delete before adding itself to
the index? For example, something like:

        def ferret_create
          begin
            ferret_delete
          rescue nil
          end
          ferret_index << self.to_doc
        end
        alias :ferret_update :ferret_create

Also, a question for David -- is auto_flush => true supposed to remove 
the lock automatically after writes?  I ask because I also tried the 
code that Kasper originally posted, and I kept getting locking errors 
unless I closed the index after updates (and I also wasn''t quite able
to
get that code to work before giving up). I was running both a Web 
instance and trying to get at it with console, which is similar, I 
think, to what would happen with multiple FCGI processes.

Thanks to everyone for your efforts, especially David for Ferret itself!

Jen

jennyw

2005-Dec-14 06:28 UTC

head link

Re: Re: ANN: acts_as_ferret

jennyw wrote:
> Also, a question for David -- is auto_flush => true supposed to remove 
> the lock automatically after writes?  I ask because I also tried the 
> code that Kasper originally posted, and I kept getting locking errors 
> unless I closed the index after updates (and I also wasn''t quite
able
> to get that code to work before giving up). I was running both a Web 
> instance and trying to get at it with console, which is similar, I 
> think, to what would happen with multiple FCGI processes. 
Oops! Never mind about the locking problem ... it turns out I had an 
older version of Ferret installed that probably didn''t support
auto_flush.

Jen

David Balmain

2005-Dec-14 06:48 UTC

head link

Re: Re: ANN: acts_as_ferret

On 12/14/05, jennyw
<jennyw-eRDYlh02QjuxE3qeFv2dE9BPR1lH4CV8@public.gmane.org>
wrote:> Also, I was wondering -- since ferret_create is aliased as
> ferret_update, shouldn''t it first call a delete before adding
itself to
> the index? For example, something like:
>
>         def ferret_create
>           begin
>             ferret_delete
>           rescue nil
>           end
>           ferret_index << self.to_doc
>         end
>         alias :ferret_update :ferret_create
Hi Jenny,

Glad to hear you like Ferret.

Note that I''ve add a key option to the index;

    @@index ||= Index::Index.new(:key => [:id, :ferret_class],

This will ensure that the index is kept unique for these fields, ie
every time I do an update the old document will be automatically
deleted. This only happens when you set the key option.
> Also, a question for David -- is auto_flush => true supposed to remove
> the lock automatically after writes?
Yes, that is the way it is supposed to work.
> I ask because I also tried the
> code that Kasper originally posted, and I kept getting locking errors
> unless I closed the index after updates (and I also wasn''t quite
able to
> get that code to work before giving up). I was running both a Web
> instance and trying to get at it with console, which is similar, I
> think, to what would happen with multiple FCGI processes.
Have you tried it with the latest version of Ferret? 3.0 had a few
bugs but 3.1 should be fine. Let me know if you are still getting lock
errors. :-)

Cheers,
Dave
> Thanks to everyone for your efforts, especially David for Ferret itself!
>
> Jen
>
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Abdur-Rahman Advany

2005-Dec-14 08:06 UTC

head link

Re: Re: ANN: acts_as_ferret

I am rewriting parts of the plug (ill contribute it around next week), I 
wanted to use search, with some special arguments for ferret, and 
arguments for find. So that when search its done, it calls find with the 
found id''s and conditions/include enc. And return whats needed. I am 
hessitating about ferret_search (no risk of being reimplemented by 
someone else) or search (very common, maybe could someday became a 
method for rails itself), what is your opinion? I was thinking of 
fetching the ferret query first and then the database entry''s (from 
mysql for example). But I can''t really think of what would be faster 
(searching ferret first or activerecords), really depends on the use of 
conditions...

Finn Smith

2005-Dec-14 17:25 UTC

head link

Re: ANN: acts_as_ferret

Thomas Lockney wrote:> Since it''s been over a week and I''ve only had time to
tinker here and there on
> my proposed changes to the acts_as_ferret plugin, I thought it was time to
just
> post what I had so far and let others weigh in on it or take their own stab
at
> making it more complete. I''ve posted my updated version along with
some brief
> notes at the bottom of the ferret wiki page here:
> http://ferret.davebalmain.com/trac/wiki/FerretOnRails
> 
> I''m still actively working on this, but I''ve only been
able to do it in fits and
> spurts so far. I appologize for the ugliness of some of the code,
I''m still
> trying to figure out how to do all the dynamic "magic" necessary
for this sort
> of thing.
It''s great that you guys are working on this. I have been following the
developments with a fair amount of interest and am hoping to integrate some of
this work with my own code on a project I am working on. A couple of questions:

Has anyone considered a universal search across multiple models yet? How would
this work considering the fact that currently the code is per model?

What about indexing fields that are not contained in the model? For example: say
I have an Article model with a belongs_to relationship to an Author model. I
would like the author''s name to be indexed along with the contents of
the article in the ferret document. I guess this may be more of a ruby
programming issue than a ferret issue. It seems that the general practice is to
keep track of fields to be used/indexed/inspected as an array of symbols. In my
notional article example that might be:

[:title, :document]

I''d prefer it to look more like:

[:title, :document, :author.name]

but ":author.name" is going to be problematic, is it not?

Any thoughts on these issues? Let me know if I have not been clear enough.

-F

rails-XAtVw6N1wmx8ahKC1EHl5g@public.gmane.org

2005-Dec-14 18:33 UTC

head link

:session question

First post here!  Here''s my question:

I have several related Category objects that all belong_to a Job  
object.  When a new Job object is to be created a user will have to  
click on the CSS tabs that I have setup with link_to Action Methods.   
I do not want the data from the forms to be persisted until all the  
sections are complete and the user clicks "Create Project"  Also I  
want the Controller to dynamically store/update each view''s session  
when any tab is arbitrarily selected

For Example, the form tabs resemble this:

Art Details   |   Dev Details   |   Marketing Details

So when I am finished with "Art Details" and click on "Dev
Details",
I want to store that form data in a session - the same for other tabs  
when the new view is selected via clicking on a new tab.

I considered using a pseudo-cart type of object to store the Projects  
"Details" objects and their associated attributes, but this
doesn''t
really Details for this model because the child Objects of Project  
will not know about their association or foreign keys until they are  
persisted.  Moreover, it would seem logical that I just store the  
post variables in some object, but then how would I restore those  
values in the fields if they go back to a previous tab?

Here''s my Object model

   Project
     |__
       |
     ArtDetails belongs to Project
     DevDetails belongs_to Project
     MarketingDetails belongs_to Project

Any suggestions?  TIA!

Erik Hatcher

2005-Dec-14 18:48 UTC

head link

Re: Re: ANN: acts_as_ferret

On Dec 14, 2005, at 3:06 AM, Abdur-Rahman Advany wrote:> I am rewriting parts of the plug (ill contribute it around next  
> week), I wanted to use search, with some special arguments for  
> ferret, and arguments for find. So that when search its done, it  
> calls find with the found id''s and conditions/include enc. And  
> return whats needed. I am hessitating about ferret_search (no risk  
> of being reimplemented by someone else) or search (very common,  
> maybe could someday became a method for rails itself), what is your  
> opinion? I was thinking of fetching the ferret query first and then  
> the database entry''s (from mysql for example). But I
can''t really
> think of what would be faster (searching ferret first or  
> activerecords), really depends on the use of conditions...
My recommendation is to index the fields you want to use as search  
criteria into Ferret rather than trying to mix and match Ferret and  
ActiveRecord searches.  Optimizing the two will be tricky - would it  
be quicker to search with Ferret and then pull from the DB or  
constrain the set by the DB first then full-text search on those?    
My hunch is that no database will have better performance than the  
potential fully optimized Ferret.  It''s certainly true in the Java  
Lucene that it is as fast and usually faster than a relational  
database for querying.

If you do go the route of searching with ActiveRecord first and using  
those results to constrain the Ferret search, consider using a Filter  
(not sure how that is implemented in Ferret, but in Java Lucene there  
are overloaded search methods that accept a Filter).

	Erik

David Balmain

2005-Dec-14 19:05 UTC

head link

Re: Re: ANN: acts_as_ferret

On 12/15/05, Erik Hatcher
<erik-LIifS8st6VgJvtFkdXX2HpqQE7yCjDx5@public.gmane.org>
wrote:> If you do go the route of searching with ActiveRecord first and using
> those results to constrain the Ferret search, consider using a Filter
> (not sure how that is implemented in Ferret, but in Java Lucene there
> are overloaded search methods that accept a Filter).
Filters are implemented in Ferret the same way as they are in Java.
They''re unit tested but I haven''t used them very much and I
don''t
suspect many other people have yet either. But they''re there if you
need them. You pass a filter object as one of the options to any of
the search methods.

Dave
>
>         Erik
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Julian ''Julik'' Tarkhanov

2005-Dec-14 19:16 UTC

head link

Re: Re: ANN: acts_as_ferret

On 14-dec-2005, at 19:48, Erik Hatcher wrote:
> On Dec 14, 2005, at 3:06 AM, Abdur-Rahman Advany wrote:
>> I am rewriting parts of the plug (ill contribute it around next  
>> week), I wanted to use search, with some special arguments for  
>> ferret, and arguments for find. So that when search its done, it  
>> calls find with the found id''s and conditions/include enc. And
>> return whats needed. I am hessitating about ferret_search (no risk  
>> of being reimplemented by someone else) or search (very common,  
>> maybe could someday became a method for rails itself), what is  
>> your opinion? I was thinking of fetching the ferret query first  
>> and then the database entry''s (from mysql for example). But I
>> can''t really think of what would be faster (searching ferret
first
>> or activerecords), really depends on the use of conditions...
>
> My recommendation is to index the fields you want to use as search  
> criteria into Ferret rather than trying to mix and match Ferret and  
> ActiveRecord searches.  Optimizing the two will be tricky - would  
> it be quicker to search with Ferret and then pull from the DB or  
> constrain the set by the DB first then full-text search on those?    
> My hunch is that no database will have better performance than the  
> potential fully optimized Ferret.  It''s certainly true in the Java
> Lucene that it is as fast and usually faster than a relational  
> database for querying.
>
> If you do go the route of searching with ActiveRecord first and  
> using those results to constrain the Ferret search, consider using  
> a Filter (not sure how that is implemented in Ferret, but in Java  
> Lucene there are overloaded search methods that accept a Filter).
Maybe someone can help me finish http://www.julik.nl/code/active- 
search/classes/ActiveSearch/FerretIndexer.html? I am sotring out the  
kinks but I am stumbling upon

RuntimeError: could not obtain lock:

and I should admit I am absolutely lost in how to handle concurrency  
with Ferret.



--
Julian ''Julik'' Tarkhanov
me at julik.nl

Thomas Lockney

2005-Dec-15 02:06 UTC

head link

Re: ANN: acts_as_ferret

David Balmain <dbalmain.ml@...> writes:
> Also, I don''t know if you meant to use symbols but you
shouldn''t use
> '':'' in a field name as it will through off the query
parser. Get rid
> of the ''"'' around :ferret_class and :id and
you''ll be fine.
Yeah, I realized this one a little while after I pasted it. I had them as
strings and had reverted back to the ":" prefixed names in an attempt
to see if
that was causing a problem I was having. I guessed I pasted it a little too
soon.
> I made both these changes on the wiki already.
Great!
> 
> One other change you may like to make is to allow Query objects to be
> passed to the find_by_contents method as well as Strings, but I''ll
> leave that one up to you for the moment.
Yeah, that was the other thing I had started working on but didn''t want
to paste
in yet. I had an implementation of it, but it was ugly, so I''m
reworking it a
bit and hope to have that in place over the weekend.
> 
> Hope that helps,
> Dave
Thanks again for developing Ferret. I''ve been waiting for this ever
since I
first started playing with Ruby and saw Erik''s registered (though,
sadly never
completed) rlucene project.

Thomas

Thomas Lockney

2005-Dec-15 02:09 UTC

head link

Re: ANN: acts_as_ferret

jennyw <jennyw@...> writes:
> 
> It''s so great that people are working on this! Ferret is great and
I
> look forward to seeing it better integrated with Rails.
> 
> Thomas -- I tried this code but experienced a few problems with it. I 
> never got it to work, and gave up since it''s not exaclty what I
need
> (the documents I''m storing in Ferret don''t exactly match
my model
> objects, but are a composite of them). Still, I have some feedback that 
> might (or might not) be helpful.
As I (think I) mentioned in my note on the wiki, the code I put there definitely
was buggy. I just wanted to put it out in case anyone else wanted to start
taking a stab at it. I''ll have a newer version sometime next week, I
hope.
> In addition to what David mentioned, I noticed that you use the method 
> class_variable_set in the method acts_as_ferret. This isn''t
available in
> Ruby 1.8.2. Moreover, I''m not sure why you''re using this
here since the
> variable names are not dynamic. I just changed these to:
> 
>              <at>  <at> fields_for_ferret = Array.new   
>              <at>  <at> class_index_dir =
configuration[:index_dir]
I''m not sure why I did that either. :-/ Guess I was just trying to get
anything
to work at that point. I''ll implement your fix. 
> 
> Also, I noticed that the indentation on the class method append_features 
> was a bit off ... it looked like super was the beginning of a block. 
> Just a minor thing.
I fixed a few indentation problems when I added it to the wiki, but must have
missed that one. Thanks.
> 
> Also, I''m confused about the name for the SingletonMethods module.
What
> is the singleton that''s being referred to here? 
I adopted that from the plugin howtos on the rails wiki:
http://wiki.rubyonrails.org/rails/pages/HowToWriteAnActsAsFoxPlugin

Thomas Lockney

2005-Dec-15 02:14 UTC

head link

Re: ANN: acts_as_ferret

David Balmain <dbalmain.ml@...> writes:
> Also, I don''t know if you meant to use symbols but you
shouldn''t use
> '':'' in a field name as it will through off the query
parser. Get rid
> of the ''"'' around :ferret_class and :id and
you''ll be fine.
Now that I think about it, I was confused for a bit about the keys defined and
was having trouble doing lookups. It turned out to be a different problem, but
in my search for a way to fix it, I changed those fields names to match (I even
tried just using symbols, but it seems that ferret didn''t like that too
much
(should symbols be an allowable option for a field name?).

Ferrets truely a great piece of work and the documentation is already quite
good, but I think there''s a lot more needed to make it fully
accessible.
Hopefully as more of us dig in, we can add to what''s there. I guess
that''s a
topic for the ferret mailing list, though. ;~)

Thomas

David Balmain

2005-Dec-15 04:59 UTC

head link

Re: Re: ANN: acts_as_ferret

On 12/15/05, Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:>
> On 14-dec-2005, at 19:48, Erik Hatcher wrote:
>
> > On Dec 14, 2005, at 3:06 AM, Abdur-Rahman Advany wrote:
> >> I am rewriting parts of the plug (ill contribute it around next
> >> week), I wanted to use search, with some special arguments for
> >> ferret, and arguments for find. So that when search its done, it
> >> calls find with the found id''s and conditions/include
enc. And
> >> return whats needed. I am hessitating about ferret_search (no risk
> >> of being reimplemented by someone else) or search (very common,
> >> maybe could someday became a method for rails itself), what is
> >> your opinion? I was thinking of fetching the ferret query first
> >> and then the database entry''s (from mysql for example).
But I
> >> can''t really think of what would be faster (searching
ferret first
> >> or activerecords), really depends on the use of conditions...
> >
> > My recommendation is to index the fields you want to use as search
> > criteria into Ferret rather than trying to mix and match Ferret and
> > ActiveRecord searches.  Optimizing the two will be tricky - would
> > it be quicker to search with Ferret and then pull from the DB or
> > constrain the set by the DB first then full-text search on those?
> > My hunch is that no database will have better performance than the
> > potential fully optimized Ferret.  It''s certainly true in the
Java
> > Lucene that it is as fast and usually faster than a relational
> > database for querying.
> >
> > If you do go the route of searching with ActiveRecord first and
> > using those results to constrain the Ferret search, consider using
> > a Filter (not sure how that is implemented in Ferret, but in Java
> > Lucene there are overloaded search methods that accept a Filter).
>
> Maybe someone can help me finish http://www.julik.nl/code/active-
> search/classes/ActiveSearch/FerretIndexer.html? I am sotring out the
> kinks but I am stumbling upon
>
> RuntimeError: could not obtain lock:
>
> and I should admit I am absolutely lost in how to handle concurrency
> with Ferret.
Using the latest version of ferret and setting :auto_flush => true
should solve this problem. Have you tried that? It only works in
Index::Index though and it''s not necessary for and IndexSearcher. If
you use IndexWriter and IndexReader directly you have to handle it
yourself.
> --
> Julian ''Julik'' Tarkhanov
> me at julik.nl
>
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Julian ''Julik'' Tarkhanov

2005-Dec-15 05:27 UTC

head link

Re: Re: ANN: acts_as_ferret

On 15-dec-2005, at 5:59, David Balmain wrote:
> On 12/15/05, Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:
>>
>> Maybe someone can help me finish http://www.julik.nl/code/active-
>> search/classes/ActiveSearch/FerretIndexer.html? I am sotring out the
>> kinks but I am stumbling upon
>>
>> RuntimeError: could not obtain lock:
>>
>> and I should admit I am absolutely lost in how to handle concurrency
>> with Ferret.
>
> Using the latest version of ferret and setting :auto_flush => true
> should solve this problem. Have you tried that? It only works in
> Index::Index though and it''s not necessary for and IndexSearcher.
If
> you use IndexWriter and IndexReader directly you have to handle it
> yourself.
David, thanks for the advice - I''ll try that and report the results.
Basically, it feels sort of _odd_ - doing this macro-style Ferret  
binging. Ferret is so vast and powerful that
this would be not enough to make use of all of it''s features. Maybe  
you can send me some advice off-list how I could
probably expand the API of the FerretIndexer to give more access to  
the most needed Ferret features in a convenient way (without making  
it too big because the whole idea of the plugin is a one-liner  
integration into a model, not a document cluster with 10 million  
entries in it.

If someone else wants to shed some light (or help with code) I would  
be glad to get some help, I am swamped now and won''t be able to get  
to it until at least next week.

--
Julian ''Julik'' Tarkhanov
me at julik.nl

David Balmain

2005-Dec-15 05:54 UTC

head link

Re: Re: ANN: acts_as_ferret

Hi Julian,

I''m really busy porting everything in Ferret to C at the moment. Next
year though I should have some time to play around with integrating it
into Rails. Until then I''ll try and be as helpful as possible to
others trying to do the same thing. Good luck! :-)

Cheers,
Dave

On 12/15/05, Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:>
> On 15-dec-2005, at 5:59, David Balmain wrote:
>
> > On 12/15/05, Julian ''Julik'' Tarkhanov
<listbox-RY+snkucC20@public.gmane.org> wrote:
> >>
> >> Maybe someone can help me finish http://www.julik.nl/code/active-
> >> search/classes/ActiveSearch/FerretIndexer.html? I am sotring out
the
> >> kinks but I am stumbling upon
> >>
> >> RuntimeError: could not obtain lock:
> >>
> >> and I should admit I am absolutely lost in how to handle
concurrency
> >> with Ferret.
> >
> > Using the latest version of ferret and setting :auto_flush => true
> > should solve this problem. Have you tried that? It only works in
> > Index::Index though and it''s not necessary for and
IndexSearcher. If
> > you use IndexWriter and IndexReader directly you have to handle it
> > yourself.
>
> David, thanks for the advice - I''ll try that and report the
results.
> Basically, it feels sort of _odd_ - doing this macro-style Ferret
> binging. Ferret is so vast and powerful that
> this would be not enough to make use of all of it''s features.
Maybe
> you can send me some advice off-list how I could
> probably expand the API of the FerretIndexer to give more access to
> the most needed Ferret features in a convenient way (without making
> it too big because the whole idea of the plugin is a one-liner
> integration into a model, not a document cluster with 10 million
> entries in it.
>
> If someone else wants to shed some light (or help with code) I would
> be glad to get some help, I am swamped now and won''t be able to
get
> to it until at least next week.
>
> --
> Julian ''Julik'' Tarkhanov
> me at julik.nl
>
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

albert ramstedt

2005-Dec-15 11:03 UTC

head link

Re: Re: ANN: acts_as_ferret

Hello!

I have been following this thread carefully, ferret just got a little 
easier to dive into. Kudos to you guys, and especially to the authors of 
ferret! This was just what we needed here at our little webdev shop.

Now I have a problem you guys might know a solution to. I have managed 
to get the code from the wiki working, with a little bit of tweaking, 
but it does not seem to build queries correctly when it gets fed with 
UTF-8 characters. Is this a fault on my side or a known issue with 
ferret? I looked at the trac but it seemed it should support UTF-8? I 
must have overlooked something...

I didnt dare to touch the wiki, but here is a somewhat altered version 
of the plugin, and it should be fully functional. I added some small 
things, since we wanted a counter for the Paginator. I know though that 
doing a full-out-search just to count might not be the best way to 
count, so if anyone has a suggestion to better this, please share! :)

Oh, and I added a rake task to rebuild the index, but it relies on the 
INDEX_PATH being set in the environment.rb

Here it is

# CODE for acts_as_ferret.rb
require ''active_record''
require ''ferret''

module FerretMixin
  module Acts #:nodoc:
     module ARFerret #:nodoc:

        def self.append_features(base)
           super
           base.extend(MacroMethods)
        end

        # declare the class level helper methods
        # which will load the relevant instance methods defined below 
when invoked

        module MacroMethods

           def acts_as_ferret
              extend FerretMixin::Acts::ARFerret::ClassMethods
              class_eval do
                 include FerretMixin::Acts::ARFerret::ClassMethods

                 after_create :ferret_create
                 after_update :ferret_update
                 after_destroy :ferret_destroy
              end
           end

        end

        module ClassMethods
           include Ferret
           INDEX_PATH = "#{RAILS_ROOT}/db/ferret"
           def self.reloadable?; false end

           # Finds instances by file contents.
           def find_by_ferret(query, options = {})
              @@index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
              @@query_parser   ||= 
QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
              query = @@query_parser.parse(query)
              result = []
              conditions = {}
              conditions[:num_docs] = options[:limit] unless 
options[:limit].blank?
              conditions[:first_doc] = options[:offset] unless 
options[:offset].blank?
             
              hits = @@index_searcher.search(query, conditions)
              hits.each do |hit, score|
                   id =
@@index_searcher.reader.get_document(hit)[''id'']
                 result << self.find(id) unless id.nil?
              end
              return result
           end
          
           def count_by_ferret(query)
                 @@index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
              @@query_parser   ||= 
QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
              query = @@query_parser.parse(query)
              return @@index_searcher.search(query).total_hits
           end

           # private

           def ferret_create
              # code to update or add to the index
              @@index ||= Index::Index.new(:path => INDEX_PATH,
                                         :auto_flush => true)
              @@index << self.to_doc
           end
           def ferret_update
                @@index ||= Index::Index.new(:path => INDEX_PATH,
                                         :auto_flush => true)
             @@index.query_delete("+id:#{self.id} 
+ferret_table:#{self.class.table_name}")
             @@index << self.to_doc
           end

           def ferret_destroy
              # code to delete from index
              @@index ||= Index::Index.new(:path => INDEX_PATH,
                                         :auto_flush => true)
              @@index.query_delete("+id:#{self.id} 
+ferret_table:#{self.class.table_name}")
           end

           def to_doc
              # Churn through the complete Active Record and add it to 
the Ferret document
              doc = Ferret::Document::Document.new
              doc <<
Ferret::Document::Field.new(''ferret_table'',
self.class.table_name, Ferret::Document::Field::Store::YES, 
Ferret::Document::Field::Index::UNTOKENIZED)
              self.attributes.each_pair do |key,val|
                 if key == ''id''
                    doc << Ferret::Document::Field.new(key, val.to_s, 
Ferret::Document::Field::Store::YES, 
Ferret::Document::Field::Index::UNTOKENIZED)
                 else
                    doc << Ferret::Document::Field.new(key, val.to_s, 
Ferret::Document::Field::Store::NO, 
Ferret::Document::Field::Index::TOKENIZED)
                 end
              end
              return doc
           end
        end
     end
  end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include FerretMixin::Acts::ARFerret
end

# END acts_as_ferret.rb

RAKE TASK in /lib/tasks/indexer.rake

include FileUtils

desc "Perform ferret index"
task :indexer => :environment do
    if !File.exist?(INDEX_PATH)
          puts "Creating index dir in #{INDEX_PATH}"
          FileUtils.mkdir_p(INDEX_PATH)
    end
   
    classes = []   
   
Dir.glob(File.join(RAILS_ROOT,"app","models","*.rb")).each
do
|rbfile|       
            bname = File.basename(rbfile,''.rb'')
            classname = Inflector.camelize(bname)
            classes.push(classname)
    end
    classes.each do |class_obj|
        c = eval(class_obj)
        if c.respond_to?(:ferret_create)
            puts "REBUILDING #{c.name}"
            c.find_all.each{|cn|cn.save}
        end
    end
end


_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

albert ramstedt

2005-Dec-15 14:22 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

To answer my own question...

This is a hack to get unicode to work, and relies on the unicode gem. 
Also, this, as opposed to my previous code listing, should work out of 
the box... except that the constant INDEX_PATH must be set before, 
preferable in environment.rb

# CODE for acts_as_ferret.rb
require ''active_record''
require ''ferret''
require ''unicode''
   
class UnicodeLowerCaseFilter < Ferret::Analysis::TokenFilter
     def next()
       t = @input.next()
 
       if (t == nil)
         return nil
       end
 
       t.term_text = Unicode::downcase(t.term_text)
 
       return t
     end
end

class SwedishTokenizer < Ferret::Analysis::RegExpTokenizer

    P     =     /[_\/.,-]/
    HASDIGIT     =     /\w*\d\w*/
   
       
    def token_re()
     %r([[:alpha:]ÅÖÄåöä]+((''[[:alpha:]ÅÖÄåöä]+)+
       |\.([[:alpha:]ÅÖÄåöä]\.)+
       |(@|\&)\w+([-.]\w+)*
      )
       |\w+(([\-._]\w+)*\@\w+([-.]\w+)+
       |#{P}#{HASDIGIT}(#{P}\w+#{P}#{HASDIGIT})*(#{P}\w+)?
       |(\.\w+)+
       |
      )
       )x
     end
end

class SwedishAnalyzer < Ferret::Analysis::Analyzer
    def token_stream(field, string)
      return UnicodeLowerCaseFilter.new(SwedishTokenizer.new(string))
    end
end

module FerretMixin
  module Acts #:nodoc:
     module ARFerret #:nodoc:

        def self.append_features(base)
           super
           base.extend(MacroMethods)
        end

        # declare the class level helper methods
        # which will load the relevant instance methods defined below 
when invoked

        module MacroMethods

           def acts_as_ferret
              extend FerretMixin::Acts::ARFerret::ClassMethods
              class_eval do
                 include FerretMixin::Acts::ARFerret::ClassMethods

                 after_create :ferret_create
                 after_update :ferret_update
                 after_destroy :ferret_destroy
              end
           end

        end

        module ClassMethods
           include Ferret
           def self.reloadable?; false end

           # Finds instances by file contents.
           def find_by_ferret(query, options = {})
              index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
              query_parser   ||= 
QueryParser.new(index_searcher.reader.get_field_names.to_a, {:analyzer 
=> SwedishAnalyzer.new()})
              query = query_parser.parse(query)
              result = []
              conditions = {}
              conditions[:num_docs] = options[:limit] unless 
options[:limit].blank?
              conditions[:first_doc] = options[:offset] unless 
options[:offset].blank?
             
              hits = index_searcher.search(query, conditions)
              hits.each do |hit, score|
                   id =
index_searcher.reader.get_document(hit)[''id'']
                 result << self.find(id) unless id.nil?
              end
              return result
           end
          
           def count_by_ferret(query)
                 index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
              query_parser   ||= 
QueryParser.new(index_searcher.reader.get_field_names.to_a, {:analyzer 
=> SwedishAnalyzer.new()})
              query = query_parser.parse(query)
              return index_searcher.search(query).total_hits
           end

           # private

           def ferret_create
              # code to update or add to the index
              index ||= Index::Index.new(:key => [:id, :ferret_table],
                                           :path => INDEX_PATH,
                                         :auto_flush => true,
                                         :analyzer => SwedishAnalyzer.new())
              index << self.to_doc
           end
           def ferret_update
                index ||= Index::Index.new( :key => [:id, :ferret_table],
                                             :path => INDEX_PATH,
                                         :auto_flush => true,
                                         :analyzer => SwedishAnalyzer.new())
             index.query_delete("+id:#{self.id.to_s} 
+ferret_table:#{self.class.table_name}")
             index << self.to_doc
           end

           def ferret_destroy
              # code to delete from index
              index ||= Index::Index.new(:key => [:id, :ferret_table],
                                           :path => INDEX_PATH,
                                         :auto_flush => true,
                                         :analyzer => SwedishAnalyzer.new())
              index.query_delete("+id:#{self.id.to_s} 
+ferret_table:#{self.class.table_name}")
           end

           def to_doc
              # Churn through the complete Active Record and add it to 
the Ferret document
              doc = Ferret::Document::Document.new
              doc <<
Ferret::Document::Field.new(''ferret_table'',
self.class.table_name, Ferret::Document::Field::Store::YES, 
Ferret::Document::Field::Index::UNTOKENIZED)
              self.attributes.each_pair do |key,val|
                 if key == ''id''
                    doc << Ferret::Document::Field.new("id",
val.to_s,
Ferret::Document::Field::Store::YES, 
Ferret::Document::Field::Index::UNTOKENIZED)
                 else
                    doc << Ferret::Document::Field.new(key, val.to_s, 
Ferret::Document::Field::Store::NO, 
Ferret::Document::Field::Index::TOKENIZED)
                 end
              end
              return doc
           end
        end
     end
  end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include FerretMixin::Acts::ARFerret
end

# END acts_as_ferret.rb

And the rake task:

include FileUtils

desc "Perform ferret index"
task :indexer => :environment do
    if !File.exist?(INDEX_PATH)
          puts "Creating index dir in #{INDEX_PATH}"
          FileUtils.mkdir_p(INDEX_PATH)
    end
   
    classes = []   
   
Dir.glob(File.join(RAILS_ROOT,"app","models","*.rb")).each
do
|rbfile|       
            bname = File.basename(rbfile,''.rb'')
            classname = Inflector.camelize(bname)
            classes.push(classname)
    end
    classes.each do |class_obj|
        c = eval(class_obj)
        if c.respond_to?(:ferret_create)
            puts "REBUILDING #{c.name}"
            c.find_all.each{|cn|cn.save}
        end
    end
end


_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

David Balmain

2005-Dec-15 14:58 UTC

head link

Re: Re: ANN: acts_as_ferret

Hi Albert,

Perhaps you could do something like this in the find_by_ferret method
and get rid of your count_by_ferret method. Just an idea.

             total_hits = hits.each do |hit, score|
                id =
@@index_searcher.reader.get_document(hit)[''id'']
                result << self.find(id) unless id.nil?
             end
             return result, total_hits

Cheers,
Dave

On 12/15/05, albert ramstedt
<albert-fwIc/cu1KZxHQX+h2pknIQ@public.gmane.org>
wrote:> Hello!
>
> I have been following this thread carefully, ferret just got a little
> easier to dive into. Kudos to you guys, and especially to the authors of
> ferret! This was just what we needed here at our little webdev shop.
>
> Now I have a problem you guys might know a solution to. I have managed
> to get the code from the wiki working, with a little bit of tweaking,
> but it does not seem to build queries correctly when it gets fed with
> UTF-8 characters. Is this a fault on my side or a known issue with
> ferret? I looked at the trac but it seemed it should support UTF-8? I
> must have overlooked something...
>
> I didnt dare to touch the wiki, but here is a somewhat altered version
> of the plugin, and it should be fully functional. I added some small
> things, since we wanted a counter for the Paginator. I know though that
> doing a full-out-search just to count might not be the best way to
> count, so if anyone has a suggestion to better this, please share! :)
>
> Oh, and I added a rake task to rebuild the index, but it relies on the
> INDEX_PATH being set in the environment.rb
>
> Here it is
>
> # CODE for acts_as_ferret.rb
> require ''active_record''
> require ''ferret''
>
> module FerretMixin
>   module Acts #:nodoc:
>      module ARFerret #:nodoc:
>
>         def self.append_features(base)
>            super
>            base.extend(MacroMethods)
>         end
>
>         # declare the class level helper methods
>         # which will load the relevant instance methods defined below
> when invoked
>
>         module MacroMethods
>
>            def acts_as_ferret
>               extend FerretMixin::Acts::ARFerret::ClassMethods
>               class_eval do
>                  include FerretMixin::Acts::ARFerret::ClassMethods
>
>                  after_create :ferret_create
>                  after_update :ferret_update
>                  after_destroy :ferret_destroy
>               end
>            end
>
>         end
>
>         module ClassMethods
>            include Ferret
>            INDEX_PATH = "#{RAILS_ROOT}/db/ferret"
>            def self.reloadable?; false end
>
>            # Finds instances by file contents.
>            def find_by_ferret(query, options = {})
>               @@index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
>               @@query_parser   ||>
QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
>               query = @@query_parser.parse(query)
>               result = []
>               conditions = {}
>               conditions[:num_docs] = options[:limit] unless
> options[:limit].blank?
>               conditions[:first_doc] = options[:offset] unless
> options[:offset].blank?
>
>               hits = @@index_searcher.search(query, conditions)
>               hits.each do |hit, score|
>                    id =
@@index_searcher.reader.get_document(hit)[''id'']
>                  result << self.find(id) unless id.nil?
>               end
>               return result
>            end
>
>            def count_by_ferret(query)
>                  @@index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
>               @@query_parser   ||>
QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
>               query = @@query_parser.parse(query)
>               return @@index_searcher.search(query).total_hits
>            end
>
>            # private
>
>            def ferret_create
>               # code to update or add to the index
>               @@index ||= Index::Index.new(:path => INDEX_PATH,
>                                          :auto_flush => true)
>               @@index << self.to_doc
>            end
>            def ferret_update
>                 @@index ||= Index::Index.new(:path => INDEX_PATH,
>                                          :auto_flush => true)
>              @@index.query_delete("+id:#{self.id}
> +ferret_table:#{self.class.table_name}")
>              @@index << self.to_doc
>            end
>
>            def ferret_destroy
>               # code to delete from index
>               @@index ||= Index::Index.new(:path => INDEX_PATH,
>                                          :auto_flush => true)
>               @@index.query_delete("+id:#{self.id}
> +ferret_table:#{self.class.table_name}")
>            end
>
>            def to_doc
>               # Churn through the complete Active Record and add it to
> the Ferret document
>               doc = Ferret::Document::Document.new
>               doc <<
Ferret::Document::Field.new(''ferret_table'',
> self.class.table_name, Ferret::Document::Field::Store::YES,
> Ferret::Document::Field::Index::UNTOKENIZED)
>               self.attributes.each_pair do |key,val|
>                  if key == ''id''
>                     doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::YES,
> Ferret::Document::Field::Index::UNTOKENIZED)
>                  else
>                     doc << Ferret::Document::Field.new(key, val.to_s,
> Ferret::Document::Field::Store::NO,
> Ferret::Document::Field::Index::TOKENIZED)
>                  end
>               end
>               return doc
>            end
>         end
>      end
>   end
> end
>
> # reopen ActiveRecord and include all the above to make
> # them available to all our models if they want it
> ActiveRecord::Base.class_eval do
>   include FerretMixin::Acts::ARFerret
> end
>
> # END acts_as_ferret.rb
>
> RAKE TASK in /lib/tasks/indexer.rake
>
> include FileUtils
>
> desc "Perform ferret index"
> task :indexer => :environment do
>     if !File.exist?(INDEX_PATH)
>           puts "Creating index dir in #{INDEX_PATH}"
>           FileUtils.mkdir_p(INDEX_PATH)
>     end
>
>     classes = []
>    
Dir.glob(File.join(RAILS_ROOT,"app","models","*.rb")).each
do
> |rbfile|
>             bname = File.basename(rbfile,''.rb'')
>             classname = Inflector.camelize(bname)
>             classes.push(classname)
>     end
>     classes.each do |class_obj|
>         c = eval(class_obj)
>         if c.respond_to?(:ferret_create)
>             puts "REBUILDING #{c.name}"
>             c.find_all.each{|cn|cn.save}
>         end
>     end
> end
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
>
>
>

David Balmain

2005-Dec-15 15:02 UTC

head link

Re: Re: ANN: acts_as_ferret

On 12/15/05, albert ramstedt
<albert-fwIc/cu1KZxHQX+h2pknIQ@public.gmane.org>
wrote:> Hello!
>
> I have been following this thread carefully, ferret just got a little
> easier to dive into. Kudos to you guys, and especially to the authors of
> ferret! This was just what we needed here at our little webdev shop.
>
> Now I have a problem you guys might know a solution to. I have managed
> to get the code from the wiki working, with a little bit of tweaking,
> but it does not seem to build queries correctly when it gets fed with
> UTF-8 characters. Is this a fault on my side or a known issue with
> ferret? I looked at the trac but it seemed it should support UTF-8? I
> must have overlooked something...
The problem is that the analyzer doesn''t understand UTF-8. You need to
write an analyzer that matches the characters in your character set.
Have at the analyzers and tokenizers included with Ferret. They''re
quite simple. Basically you just need to come up with a regular
expression that matches what you consider tokens in your data. For
example, the whitespace tokenizer uses /\S+/. The letter tokenizer
uses /[:alpha:]+/. This is actually where the problem with UTF-8
handling is. [:alpha:] only matches the ascii alphabet in the current
Ruby regexp engine. That will change in Ruby 2.0.

HTH,
Dave

David Balmain

2005-Dec-15 15:05 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

On 12/15/05, albert ramstedt <albert@delamednoll.se>
wrote:> To answer my own question...
>
> This is a hack to get unicode to work, and relies on the unicode gem.
> Also, this, as opposed to my previous code listing, should work out of
> the box... except that the constant INDEX_PATH must be set before,
> preferable in environment.rb
>
> # CODE for acts_as_ferret.rb
> require 'active_record'
> require 'ferret'
> require 'unicode'
>
> class UnicodeLowerCaseFilter < Ferret::Analysis::TokenFilter
>      def next()
>        t = @input.next()
>
>        if (t == nil)
>          return nil
>        end
>
>        t.term_text = Unicode::downcase(t.term_text)
>
>        return t
>      end
> end
>
> class SwedishTokenizer < Ferret::Analysis::RegExpTokenizer
>
>     P     =     /[_\/.,-]/
>     HASDIGIT     =     /\w*\d\w*/
>
>
>     def token_re()
>      %r([[:alpha:]ЕЦДецд]+(('[[:alpha:]ЕЦДецд]+)+
>        |\.([[:alpha:]ЕЦДецд]\.)+
>        |(@|\&)\w+([-.]\w+)*
>       )
>        |\w+(([\-._]\w+)*\@\w+([-.]\w+)+
>        |#{P}#{HASDIGIT}(#{P}\w+#{P}#{HASDIGIT})*(#{P}\w+)?
>        |(\.\w+)+
>        |
>       )
>        )x
>      end
> end
>
> class SwedishAnalyzer < Ferret::Analysis::Analyzer
>     def token_stream(field, string)
>       return UnicodeLowerCaseFilter.new(SwedishTokenizer.new(string))
>     end
> end
Oh, very cool. Sorry, I just replied to your other email before I saw
this. Do you mind if I put it on the Ferret Wiki in the howtos
section? Even better if you could do it. ;-)

Thanks for posting this Albert. Hope my other code snippet helped.

Cheers,
Dave

_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

Fabien Franzen

2005-Dec-15 15:05 UTC

head link

Re: ANN: acts_as_ferret and a fix for unicode

albert ramstedt <albert@...> writes:
> 
> To answer my own question...
> 
> This is a hack to get unicode to work, and relies on the unicode gem. 
> Also, this, as opposed to my previous code listing, should work out of 
> the box... except that the constant INDEX_PATH must be set before, 
> preferable in environment.rb
Nice to see this addition. I''m wondering wether this will work for
other
European languages besides Swedish though. Is there a way to make it 
more universal?

Thanks.

David Balmain

2005-Dec-15 18:18 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

On 12/16/05, Fabien Franzen
<fabienf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> albert ramstedt <albert@...> writes:
>
> >
> > To answer my own question...
> >
> > This is a hack to get unicode to work, and relies on the unicode gem.
> > Also, this, as opposed to my previous code listing, should work out of
> > the box... except that the constant INDEX_PATH must be set before,
> > preferable in environment.rb
>
> Nice to see this addition. I''m wondering wether this will work for
other
> European languages besides Swedish though. Is there a way to make it
> more universal?
Hi Fabien,
As far as I know this will work for any european language, or any
language for that matter. You just need to include the required
characters in the regular expression. Once the data is split into
tokens, Ferret doesn''t care what the string looks like. You can even
store binary data like images in a Ferret index if you want to. Now we
just need people to add the necessary characters for all the different
European languages. :-)

Dave



As far
> Thanks.
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

albert ramstedt

2005-Dec-15 19:18 UTC

head link

Re: Re: ANN: acts_as_ferret

Hi David,

The problem is, that i need that query to use the paginator, ie i need 
the hits before i do the actual search with the limit and offset, and 
since that query also translates into model objects, it hits the 
database when it doesnt actually need to. But I agree, my solution is 
not really that nice either.

Albert

David Balmain wrote:
>Hi Albert,
>
>Perhaps you could do something like this in the find_by_ferret method
>and get rid of your count_by_ferret method. Just an idea.
>
>             total_hits = hits.each do |hit, score|
>                id =
@@index_searcher.reader.get_document(hit)[''id'']
>                result << self.find(id) unless id.nil?
>             end
>             return result, total_hits
>
>Cheers,
>Dave
>
>On 12/15/05, albert ramstedt
<albert-fwIc/cu1KZxHQX+h2pknIQ@public.gmane.org> wrote:
>  
>
>>Hello!
>>
>>I have been following this thread carefully, ferret just got a little
>>easier to dive into. Kudos to you guys, and especially to the authors of
>>ferret! This was just what we needed here at our little webdev shop.
>>
>>Now I have a problem you guys might know a solution to. I have managed
>>to get the code from the wiki working, with a little bit of tweaking,
>>but it does not seem to build queries correctly when it gets fed with
>>UTF-8 characters. Is this a fault on my side or a known issue with
>>ferret? I looked at the trac but it seemed it should support UTF-8? I
>>must have overlooked something...
>>
>>I didnt dare to touch the wiki, but here is a somewhat altered version
>>of the plugin, and it should be fully functional. I added some small
>>things, since we wanted a counter for the Paginator. I know though that
>>doing a full-out-search just to count might not be the best way to
>>count, so if anyone has a suggestion to better this, please share! :)
>>
>>Oh, and I added a rake task to rebuild the index, but it relies on the
>>INDEX_PATH being set in the environment.rb
>>
>>Here it is
>>
>># CODE for acts_as_ferret.rb
>>require ''active_record''
>>require ''ferret''
>>
>>module FerretMixin
>>  module Acts #:nodoc:
>>     module ARFerret #:nodoc:
>>
>>        def self.append_features(base)
>>           super
>>           base.extend(MacroMethods)
>>        end
>>
>>        # declare the class level helper methods
>>        # which will load the relevant instance methods defined below
>>when invoked
>>
>>        module MacroMethods
>>
>>           def acts_as_ferret
>>              extend FerretMixin::Acts::ARFerret::ClassMethods
>>              class_eval do
>>                 include FerretMixin::Acts::ARFerret::ClassMethods
>>
>>                 after_create :ferret_create
>>                 after_update :ferret_update
>>                 after_destroy :ferret_destroy
>>              end
>>           end
>>
>>        end
>>
>>        module ClassMethods
>>           include Ferret
>>           INDEX_PATH = "#{RAILS_ROOT}/db/ferret"
>>           def self.reloadable?; false end
>>
>>           # Finds instances by file contents.
>>           def find_by_ferret(query, options = {})
>>              @@index_searcher ||= Search::IndexSearcher.new(INDEX_PATH)
>>              @@query_parser  
||>>QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
>>              query = @@query_parser.parse(query)
>>              result = []
>>              conditions = {}
>>              conditions[:num_docs] = options[:limit] unless
>>options[:limit].blank?
>>              conditions[:first_doc] = options[:offset] unless
>>options[:offset].blank?
>>
>>              hits = @@index_searcher.search(query, conditions)
>>              hits.each do |hit, score|
>>                   id =
@@index_searcher.reader.get_document(hit)[''id'']
>>                 result << self.find(id) unless id.nil?
>>              end
>>              return result
>>           end
>>
>>           def count_by_ferret(query)
>>                 @@index_searcher ||=
Search::IndexSearcher.new(INDEX_PATH)
>>              @@query_parser  
||>>QueryParser.new(@@index_searcher.reader.get_field_names.to_a)
>>              query = @@query_parser.parse(query)
>>              return @@index_searcher.search(query).total_hits
>>           end
>>
>>           # private
>>
>>           def ferret_create
>>              # code to update or add to the index
>>              @@index ||= Index::Index.new(:path => INDEX_PATH,
>>                                         :auto_flush => true)
>>              @@index << self.to_doc
>>           end
>>           def ferret_update
>>                @@index ||= Index::Index.new(:path => INDEX_PATH,
>>                                         :auto_flush => true)
>>             @@index.query_delete("+id:#{self.id}
>>+ferret_table:#{self.class.table_name}")
>>             @@index << self.to_doc
>>           end
>>
>>           def ferret_destroy
>>              # code to delete from index
>>              @@index ||= Index::Index.new(:path => INDEX_PATH,
>>                                         :auto_flush => true)
>>              @@index.query_delete("+id:#{self.id}
>>+ferret_table:#{self.class.table_name}")
>>           end
>>
>>           def to_doc
>>              # Churn through the complete Active Record and add it to
>>the Ferret document
>>              doc = Ferret::Document::Document.new
>>              doc <<
Ferret::Document::Field.new(''ferret_table'',
>>self.class.table_name, Ferret::Document::Field::Store::YES,
>>Ferret::Document::Field::Index::UNTOKENIZED)
>>              self.attributes.each_pair do |key,val|
>>                 if key == ''id''
>>                    doc << Ferret::Document::Field.new(key,
val.to_s,
>>Ferret::Document::Field::Store::YES,
>>Ferret::Document::Field::Index::UNTOKENIZED)
>>                 else
>>                    doc << Ferret::Document::Field.new(key,
val.to_s,
>>Ferret::Document::Field::Store::NO,
>>Ferret::Document::Field::Index::TOKENIZED)
>>                 end
>>              end
>>              return doc
>>           end
>>        end
>>     end
>>  end
>>end
>>
>># reopen ActiveRecord and include all the above to make
>># them available to all our models if they want it
>>ActiveRecord::Base.class_eval do
>>  include FerretMixin::Acts::ARFerret
>>end
>>
>># END acts_as_ferret.rb
>>
>>RAKE TASK in /lib/tasks/indexer.rake
>>
>>include FileUtils
>>
>>desc "Perform ferret index"
>>task :indexer => :environment do
>>    if !File.exist?(INDEX_PATH)
>>          puts "Creating index dir in #{INDEX_PATH}"
>>          FileUtils.mkdir_p(INDEX_PATH)
>>    end
>>
>>    classes = []
>>   
Dir.glob(File.join(RAILS_ROOT,"app","models","*.rb")).each
do
>>|rbfile|
>>            bname = File.basename(rbfile,''.rb'')
>>            classname = Inflector.camelize(bname)
>>            classes.push(classname)
>>    end
>>    classes.each do |class_obj|
>>        c = eval(class_obj)
>>        if c.respond_to?(:ferret_create)
>>            puts "REBUILDING #{c.name}"
>>            c.find_all.each{|cn|cn.save}
>>        end
>>    end
>>end
>>
>>
>>_______________________________________________
>>Rails mailing list
>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>
>>
>>
>>
>>    
>>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>
>
>  
>

_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

Albert Ramstedt

2005-Dec-15 19:20 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

Hi

Ofcourse you can add it to the wiki! The mail seems to have scrambled 
the utf characters, so keep that in mind if you intend to use the 
swedish tokenizer.

Albert

David Balmain wrote:
>On 12/15/05, albert ramstedt
<albert-fwIc/cu1KZxHQX+h2pknIQ@public.gmane.org> wrote:
>  
>
>>To answer my own question...
>>
>>This is a hack to get unicode to work, and relies on the unicode gem.
>>Also, this, as opposed to my previous code listing, should work out of
>>the box... except that the constant INDEX_PATH must be set before,
>>preferable in environment.rb
>>
>># CODE for acts_as_ferret.rb
>>require ''active_record''
>>require ''ferret''
>>require ''unicode''
>>
>>class UnicodeLowerCaseFilter < Ferret::Analysis::TokenFilter
>>     def next()
>>       t = @input.next()
>>
>>       if (t == nil)
>>         return nil
>>       end
>>
>>       t.term_text = Unicode::downcase(t.term_text)
>>
>>       return t
>>     end
>>end
>>
>>class SwedishTokenizer < Ferret::Analysis::RegExpTokenizer
>>
>>    P     =     /[_\/.,-]/
>>    HASDIGIT     =     /\w*\d\w*/
>>
>>
>>    def token_re()
>>     %r([[:alpha:]ЕЦДецд]+((''[[:alpha:]ЕЦДецд]+)+
>>       |\.([[:alpha:]ЕЦДецд]\.)+
>>       |(@|\&)\w+([-.]\w+)*
>>      )
>>       |\w+(([\-._]\w+)*\@\w+([-.]\w+)+
>>       |#{P}#{HASDIGIT}(#{P}\w+#{P}#{HASDIGIT})*(#{P}\w+)?
>>       |(\.\w+)+
>>       |
>>      )
>>       )x
>>     end
>>end
>>
>>class SwedishAnalyzer < Ferret::Analysis::Analyzer
>>    def token_stream(field, string)
>>      return UnicodeLowerCaseFilter.new(SwedishTokenizer.new(string))
>>    end
>>end
>>    
>>
>
>Oh, very cool. Sorry, I just replied to your other email before I saw
>this. Do you mind if I put it on the Ferret Wiki in the howtos
>section? Even better if you could do it. ;-)
>
>Thanks for posting this Albert. Hope my other code snippet helped.
>
>Cheers,
>Dave
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>  
>

hui

2005-Dec-16 05:14 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

It''s so cool!
I am just looking for the CJK solutions,
Here is "JavaCC code for the Nutch lexical analyzer."
Inlucded in Nutch source code, so could anyone port it into ferret?
===================================================/**
 * Copyright 2005 The Apache Software Foundation
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/** JavaCC code for the Nutch lexical analyzer. */

options {
  STATIC = false;
  USER_CHAR_STREAM = true;
  OPTIMIZE_TOKEN_MANAGER = true;
  UNICODE_INPUT = true;
//DEBUG_TOKEN_MANAGER = true;
}

PARSER_BEGIN(NutchAnalysis)

package org.apache.nutch.analysis;

import org.apache.nutch.searcher.Query;
import org.apache.nutch.searcher.QueryFilters;
import org.apache.nutch.searcher.Query.Clause;

import org.apache.lucene.analysis.StopFilter;

import java.io.*;
import java.util.*;

/** The JavaCC-generated Nutch lexical analyzer and query parser. */
public class NutchAnalysis {

  private static final String[] STOP_WORDS = {
    "a", "and", "are", "as",
"at", "be", "but", "by",
    "for", "if", "in", "into",
"is", "it",
    "no", "not", "of", "on",
"or", "s", "such",
    "t", "that", "the", "their",
"then", "there", "these",
    "they", "this", "to", "was",
"will", "with"
  };

  private static final Set STOP_SET = StopFilter.makeStopSet(STOP_WORDS);

  private String queryString;

  /** True iff word is a stop word.  Stop words are only removed from queries.
   * Every word is indexed.  */
  public static boolean isStopWord(String word) {
    return STOP_SET.contains(word);
  }

  /** Construct a query parser for the text in a reader. */
  public static Query parseQuery(String queryString) throws IOException {
    NutchAnalysis parser       new NutchAnalysis(new FastCharStream(new
StringReader(queryString)));
    parser.queryString = queryString;
    return parser.parse();
  }

  /** For debugging. */
  public static void main(String[] args) throws Exception {
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    while (true) {
      System.out.print("Query: ");
      String line = in.readLine();
      System.out.println(parseQuery(line));
    }
  }

}

PARSER_END(NutchAnalysis)

TOKEN_MGR_DECLS : {

  /** Constructs a token manager for the provided Reader. */
  public NutchAnalysisTokenManager(Reader reader) {
    this(new FastCharStream(reader));
  }

}

TOKEN : {					  // token regular expressions

  // basic word -- lowercase it
<WORD: ((<LETTER>|<DIGIT>|<WORD_PUNCT>)+ |
<IRREGULAR_WORD>)>
  { matchedToken.image = matchedToken.image.toLowerCase(); }

  // special handling for acronyms: U.S.A., I.B.M., etc: dots are removed
| <ACRONYM: <LETTER> "." (<LETTER> ".")+ >
    {                                             // remove dots
      for (int i = 0; i < image.length(); i++) {
	if (image.charAt(i) == ''.'')
	  image.deleteCharAt(i--);
      }
      matchedToken.image = image.toString().toLowerCase();
    }

  // chinese, japanese and korean characters
| <SIGRAM: <CJK> >

   // irregular words
| <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
| <#C_PLUS_PLUS: ("C"|"c") "++" >
| <#C_SHARP: ("C"|"c") "#" >

  // query syntax characters
| <PLUS: "+" >
| <MINUS: "-" >
| <QUOTE: "\"" >
| <COLON: ":" >
| <SLASH: "/" >
| <DOT: "." >
| <ATSIGN: "@" >
| <APOSTROPHE: "''" >

| <WHITE: ~[] >                                   // treat unrecognized
chars
                                                  // as whitespace
// primitive, non-token patterns

| <#WORD_PUNCT: ("_"|"&")>                       
// allowed anywhere in words

| < #LETTER:					  // alphabets
    [
        "\u0041"-"\u005a",
        "\u0061"-"\u007a",
        "\u00c0"-"\u00d6",
        "\u00d8"-"\u00f6",
        "\u00f8"-"\u00ff",
        "\u0100"-"\u1fff"
    ]
    >

|  <#CJK:                                        // non-alphabets
      [
       "\u3040"-"\u318f",
       "\u3300"-"\u337f",
       "\u3400"-"\u3d2d",
       "\u4e00"-"\u9fff",
       "\uf900"-"\ufaff"
      ]
    >

| < #DIGIT:					  // unicode digits
      [
       "\u0030"-"\u0039",
       "\u0660"-"\u0669",
       "\u06f0"-"\u06f9",
       "\u0966"-"\u096f",
       "\u09e6"-"\u09ef",
       "\u0a66"-"\u0a6f",
       "\u0ae6"-"\u0aef",
       "\u0b66"-"\u0b6f",
       "\u0be7"-"\u0bef",
       "\u0c66"-"\u0c6f",
       "\u0ce6"-"\u0cef",
       "\u0d66"-"\u0d6f",
       "\u0e50"-"\u0e59",
       "\u0ed0"-"\u0ed9",
       "\u1040"-"\u1049"
      ]
  >

}


/** Parse a query. */
Query parse() :
{
  Query query = new Query();
  ArrayList terms;
  Token token;
  String field;
  boolean stop;
  boolean prohibited;

}
{
  nonOpOrTerm()                                   // skip noise
  (
    { stop=true; prohibited=false; field = Clause.DEFAULT_FIELD; }

                                                  // optional + or - operator
    ( <PLUS> {stop=false;} | (<MINUS> { stop=false;prohibited=true;
} ))?

                                                  // optional field spec.
    ( LOOKAHEAD(<WORD><COLON>(phrase(field)|compound(field)))
      token=<WORD> <COLON> { field = token.image; } )?

    ( terms=phrase(field) {stop=false;} |         // quoted terms or
      terms=compound(field))                      // single or compound term

    nonOpOrTerm()                                 // skip noise

    {
      String[] array = (String[])terms.toArray(new String[terms.size()]);

      if (stop
          && field == Clause.DEFAULT_FIELD
          && terms.size()==1
          && isStopWord(array[0])) {
        // ignore stop words only when single, unadorned terms in default field
      } else {
        if (prohibited)
          query.addProhibitedPhrase(array, field);
        else
          query.addRequiredPhrase(array, field);
      }
    }
  )*

  { return query; }

}

/** Parse an explcitly quoted phrase query.  Note that this may return a single
 * term, a trivial phrase.*/
ArrayList phrase(String field) :
{
  int start;
  int end;
  ArrayList result = new ArrayList();
  String term;
}
{
  <QUOTE>

  { start = token.endColumn; }

  (nonTerm())*                                    // skip noise
  ( term = term() { result.add(term); }           // parse a term
    (nonTerm())*)*                                // skip noise

  { end = token.endColumn; }

  (<QUOTE>|<EOF>)

  {
    if (QueryFilters.isRawField(field)) {
      result.clear();
      result.add(queryString.substring(start, end));
    }
    return result;
  }

}

/** Parse a compound term that is interpreted as an implicit phrase query.
 * Compounds are a sequence of terms separated by infix characters.  Note that
 * htis may return a single term, a trivial compound. */
ArrayList compound(String field) :
{
  int start;
  ArrayList result = new ArrayList();
  String term;
}
{
  { start = token.endColumn; }

  term = term() { result.add(term); }
  ( LOOKAHEAD( (infix())+ term() )
    (infix())+
    term = term() { result.add(term); })*

  {
    if (QueryFilters.isRawField(field)) {
      result.clear();
      result.add(queryString.substring(start, token.endColumn));
    }
    return result;
  }

}

/** Parse a single term. */
String term() :
{
  Token token;
}
{
  ( token=<WORD> | token=<ACRONYM> | token=<SIGRAM>)

  { return token.image; }
}


/** Parse anything but a term or a quote. */
void nonTerm() :
{}
{
  <WHITE> | infix()
}

void nonTermOrEOF() :
{}
{
  nonTerm() | <EOF>
}

/** Parse anything but a term or an operator (plur or minus or quote). */
void nonOpOrTerm() :
{}
{
  (LOOKAHEAD(2) (<WHITE> | nonOpInfix() | ((<PLUS>|<MINUS>)
nonTermOrEOF())))*
}

/** Characters which can be used to form compound terms. */
void infix() :
{}
{
  <PLUS> | <MINUS> | nonOpInfix()
}

/** Parse infix characters except plus and minus. */
void nonOpInfix() :
{}
{
  <COLON>|<SLASH>|<DOT>|<ATSIGN>|<APOSTROPHE>
}

Erik Hatcher

2005-Dec-16 11:14 UTC

head link

Re: Re: ANN: acts_as_ferret and a fix for unicode

On Dec 16, 2005, at 12:14 AM, hui wrote:> It''s so cool!
> I am just looking for the CJK solutions,
> Here is "JavaCC code for the Nutch lexical analyzer."
> Inlucded in Nutch source code, so could anyone port it into ferret?
There are several other Analyzers in Lucene that can deal with CJK  
(and actually Korean doesn''t really fit with Chinese and Japanese).   
Lucene''s StandardAnalyzer recognizes the CJK range just as the Nutch  
one does, and there are also these additional ones (in the cjk and cn  
directories):

	<http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/ 
analyzers/src/java/org/apache/lucene/analysis/>

Erik

Seemingly Similar Threads

Search for more apparently analagous threads

Rails - Dec 2005 - ANN: acts_as_ferret

ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

[Ferret-talk] [Rails] ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

[Ferret-talk] [Rails] Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

:session question

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret and a fix for unicode

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret and a fix for unicode

Re: ANN: acts_as_ferret and a fix for unicode

Re: Re: ANN: acts_as_ferret and a fix for unicode

Re: Re: ANN: acts_as_ferret

Re: Re: ANN: acts_as_ferret and a fix for unicode

Re: Re: ANN: acts_as_ferret and a fix for unicode

Re: Re: ANN: acts_as_ferret and a fix for unicode

Seemingly Similar Threads