Hi there, I''m working with some legacy data where customer phone numbers are stored with hyphens between the area code, exchange, and number (e.g. 555-555-5555). Is this the best way to store a phone number? Perhaps not, but it''s the way they were being stored, so I have to work with this format. Right, so when I save a record the log tells me acts_as_ferret indexed the number with the hyphens in place OK. However, find_by_contents does not return any results if I query like 555-555-5555. If I remove the hyphens and save the record, find_by_contents will return results (e.g. 5555555555). Does anyone have any thoughts on this? Thanks in advance! M. -- Posted via http://www.ruby-forum.com/.
Hi! On Wed, Aug 30, 2006 at 05:29:58PM +0200, Michael Leung wrote:> Hi there, > > I''m working with some legacy data where customer phone numbers are > stored with hyphens between the area code, exchange, and number (e.g. > 555-555-5555). Is this the best way to store a phone number? Perhaps > not, but it''s the way they were being stored, so I have to work with > this format. > > Right, so when I save a record the log tells me acts_as_ferret indexed > the number with the hyphens in place OK. However, find_by_contents does > not return any results if I query like 555-555-5555.Seems the tokenizer strips out the hyphens. This happens inside Ferret, after acts_as_ferret''s debug message. use something like acts_as_ferret :fields => { :phone => { :index => :untokenized }, ...other fields go here } to let Ferret store the phone numbers unchanged. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hey Jens, Thanks for the reply. When I try this ''work_phone'' => { :index => :untokenized }, ''home_phone'' => { :index => :untokenized } I get: unknown stored parameter untokenized C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:in `index='' C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:182:in `initialize'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:113:in `work_phone_to_ferret'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:554:in `to_doc'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:553:in `to_doc'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `ferret_update'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:333:in `callback'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:330:in `callback'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:268:in `update_without_timestamps'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/timestamp.rb:48:in `update'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1760:in `create_or_update_without_callbacks'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:242:in `create_or_update'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1523:in `save_without_validation'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/validations.rb:744:in `save_without_transactions'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:120:in `save'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:51:in `transaction'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:86:in `transaction'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:112:in `transaction'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:120:in `save'' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1570:in `update_attributes'' #{RAILS_ROOT}/app/controllers/customers_controller.rb:23:in `update'' Jens Kraemer wrote:> Hi! > On Wed, Aug 30, 2006 at 05:29:58PM +0200, Michael Leung wrote: >> not return any results if I query like 555-555-5555. > Seems the tokenizer strips out the hyphens. This happens inside Ferret, > after acts_as_ferret''s debug message. > > use something like > > acts_as_ferret :fields => { > :phone => { :index => :untokenized }, > ...other fields go here > } > > to let Ferret store the phone numbers unchanged. > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66-- Posted via http://www.ruby-forum.com/.
On Wed, Aug 30, 2006 at 07:02:17PM +0200, Michael Leung wrote:> > Hey Jens, > > Thanks for the reply. > > > > When I try this ''work_phone'' => { :index => :untokenized }, ''home_phone'' > => { :index => :untokenized } I get: > > unknown stored parameter untokenized > > C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:inok, the above is for Ferret 0.10.x . :index => Ferret::Document::Field::Index::UNTOKENIZED should work for you. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hey again Jens, Strange, I no longer get an error when I reference the constant for untokenized by the fully qualified name as you suggested, but results still do not come back when searching with hyphens. Hmmm.... Thanks for you help thus far. Jens Kraemer wrote:> On Wed, Aug 30, 2006 at 07:02:17PM +0200, Michael Leung wrote: >> unknown stored parameter untokenized >> >> C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:in > > ok, the above is for Ferret 0.10.x . > > :index => Ferret::Document::Field::Index::UNTOKENIZED > > should work for you. > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66-- Posted via http://www.ruby-forum.com/.
On Wed, Aug 30, 2006 at 08:24:02PM +0200, Michael Leung wrote:> > Hey again Jens, > > Strange, I no longer get an error when I reference the constant for > untokenized by the fully qualified name as you suggested, but results > still do not come back when searching with hyphens.maybe the problem isn''t tokenization-related, but you''re trying to search for a substring of the phone number not beginning at the first character ? example: if your indexed value is ''123-45-55555'', a search for ''123*'' or ''123-45*'' should find the record, but a search for ''45*'' won''t. Is this the behaviour you experience ? Wildcards at the beginning, as in ''*45*'', don''t always work. There currently is another thread about this topic, it''s unclear if this is supposed to work or not atm, hope Dave can shed some light on this). To be able to search only for area code or phone number, you should tokenize the phone number into parts (split at the hyphens). Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Heya Jens, Actually, I''m having the problem where no records get returned, when I query for the full number: 123-45-55555 for example. M. Jens Kraemer wrote:> On Wed, Aug 30, 2006 at 08:24:02PM +0200, Michael Leung wrote: >> >> Hey again Jens, >> >> Strange, I no longer get an error when I reference the constant for >> untokenized by the fully qualified name as you suggested, but results >> still do not come back when searching with hyphens. > > maybe the problem isn''t tokenization-related, but you''re trying to > search for a substring of the phone number not beginning at the first > character ? > > example: > if your indexed value is ''123-45-55555'', > a search for ''123*'' or ''123-45*'' should find the record, but a search > for ''45*'' won''t. Is this the behaviour you experience ? > > Wildcards at the beginning, as in ''*45*'', don''t always work. There > currently is another thread about this topic, it''s unclear if this is > supposed to work or not atm, hope Dave can shed some light on this). > > To be able to search only for area code or phone number, you should > tokenize the phone number into parts (split at the hyphens). > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66-- Posted via http://www.ruby-forum.com/.
On 9/1/06, Michael Leung <mleung at projectrideme.com> wrote:> Heya Jens, > > Actually, I''m having the problem where no records get returned, when I > query for the full number: 123-45-55555 for example. > > M.Hi Michael, It works here in version 0.10.1; irb(main):001:0> require ''rubygems'' => true irb(main):002:0> require ''ferret'' => false irb(main):003:0> include Ferret => Object irb(main):004:0> i = I.new => #<Index> irb(main):005:0> i << {:content => "the phone number is 123-45-55555"} => nil irb(main):006:0> i.search("content:123-45-55555") => #<struct Ferret::Search::TopDocs total_hits=1, hits=[#<struct Ferret::Search::Hit doc=0, score=0.1534264087677>], max_score=0.1534264087677> irb(main):007:0> I put a bug-fix for this in version 0.10.1. I think it is fixed in 0.9.6 too but I can''t remember for certain. You''re better off upgrading to 0.10.1, especially if you are using acts_as_ferret (since most of the work has already been done for you). Cheers, Dave