I''m getting "uninitialized constant UNTOKENIZED" when I try to do something like the following: class Url < ActiveRecord::Base acts_as_ferret :fields => {''name'' => {}, ''description'' => {}, ''url'' => {:index => Ferret::Document::Field::Index::UNTOKENIZED}, ''url_type'' => {}} end I am running ferret 0.10.1 and "bleeding edge version" of acts_as_ferret I got from svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin/acts_as_ferret. Any ideas? -- Posted via http://www.ruby-forum.com/.
Caleb schrieb:> I''m getting "uninitialized constant UNTOKENIZED" when I try to do > something like the following: > > class Url < ActiveRecord::Base > acts_as_ferret :fields => {''name'' => {}, > ''description'' => {}, > ''url'' => {:index => > Ferret::Document::Field::Index::UNTOKENIZED}, > ''url_type'' => {}} > end > > I am running ferret 0.10.1 and "bleeding edge version" of acts_as_ferret > I got from > svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin/acts_as_ferret. >i don''t know much about the acts_as_ferret plugin, but the CONST you''re talking about was part of the 0.9 ferret source tree and is gone from ferret 0.10 on.. Ben
Benjamin Krause wrote:> Caleb schrieb: > > i don''t know much about the acts_as_ferret plugin, but the CONST you''re > talking about was part of the 0.9 ferret source tree and is gone from > ferret 0.10 on.. > > BenThanks. Then, I guess my question is, "How do I set something to be not be tokenized?" To further explain, I would like to be able to type in only a part of a URL (ie: microsoft) and get back any URL that contains that query (ie: www.microsoft.com, www.foo.com/microsoft, microsoft.foobar.com, etc.). -- Posted via http://www.ruby-forum.com/.
Caleb schrieb:> Thanks. Then, I guess my question is, "How do I set something to be not > be tokenized?" To further explain, I would like to be able to type in > only a part of a URL (ie: microsoft) and get back any URL that contains > that query (ie: www.microsoft.com, www.foo.com/microsoft, > microsoft.foobar.com, etc.). >again.. i can just talk about ferret, not acts_as_ferret.. you do now pass a hash describing all fields as a so called fieldinfo-hash to the index. . see http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfos.html for more information. i guess this will be fixed within the next days and you will get a message from jens as soon as he gets back to his computer :) Ben
Hi! On Tue, Aug 29, 2006 at 10:01:11PM +0200, Benjamin Krause wrote:> Caleb schrieb: > > Thanks. Then, I guess my question is, "How do I set something to be not > > be tokenized?" To further explain, I would like to be able to type in > > only a part of a URL (ie: microsoft) and get back any URL that contains > > that query (ie: www.microsoft.com, www.foo.com/microsoft, > > microsoft.foobar.com, etc.). > > > again.. i can just talk about ferret, not acts_as_ferret.. you do now > pass a hash describing all fields as a so called fieldinfo-hash to the > index. . see > http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfos.html > for more information. i guess this will be fixed within the next days > and you will get a message from jens as soon as he gets back to his > computer :)right ;-) Caleb, not tokenizing the url field won''t help you much with your problem. An untokenized field''s content is indexed ''as is'', so indexing ''www.gnu.org'' will leave you with that exact term in the index, and a search for ''gnu'' won''t find that. Even searching for ''gnu*'' won''t find it, since the term starts with ''www.'' and a wildcard at the beginning of the query term (like ''*gnu*'') is not allowed due to the way the index works. Better would be to use a custom tokenizer that splits the contents for this field at ''.'' and ''/'' (and maybe strips out the ''www'', as that will be shared by most URLs and won''t help much when it comes to searching) so that ''gnu org'' would be indexed. now a search for ''gnu'' will find what you want. If you aren''t keen on implementing your own tokenizer, define a method that pre-processes the url and splits it like described, and index the return value of this method: so you''d use class Url < ActiveRecord::Base acts_as_ferret :fields => {:name => {}, :description => {}, :url_parts => { :index => :untokenized }, :url_type => {}} def url_parts # split url and remove common terms self.url.split(/[\/.]/) - [ ''www'', ''html'' ] end end Concerning the new 0.10 FieldInfo properties: you can use all the properties and values described at http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfo.html in your call to acts_as_ferret, they will be passed straight through to Ferret upon index creation. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jens Kraemer wrote:> so you''d use > > class Url < ActiveRecord::Base > acts_as_ferret :fields => {:name => {}, > :description => {}, > :url_parts => { :index => :untokenized }, > :url_type => {}} > def url_parts > # split url and remove common terms > self.url.split(/[\/.]/) - [ ''www'', ''html'' ] > end > endThanks! That gives me the information I need. One question remains, however. I definately want to do custom tokenizing for urls. However, when I use the code quoted above, the url_parts method is never executed (ie: if I put a breakpoint there, the code never hits it). How can I get ferret to reference ''url_parts'' and call that method? -- Posted via http://www.ruby-forum.com/.
On Wed, Aug 30, 2006 at 01:26:17AM +0200, Caleb wrote:> Jens Kraemer wrote: > > so you''d use > > > > class Url < ActiveRecord::Base > > acts_as_ferret :fields => {:name => {}, > > :description => {}, > > :url_parts => { :index => :untokenized }, > > :url_type => {}} > > def url_parts > > # split url and remove common terms > > self.url.split(/[\/.]/) - [ ''www'', ''html'' ] > > end > > end > > Thanks! That gives me the information I need. One question remains, > however. I definately want to do custom tokenizing for urls. However, > when I use the code quoted above, the url_parts method is never executed > (ie: if I put a breakpoint there, the code never hits it). How can I > get ferret to reference ''url_parts'' and call that method?it should get called whenever acts_as_ferret indexes a record, since it is referenced in the :fields hash. what does aaf log when you create a new Url record ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jens Kraemer wrote:> it should get called whenever acts_as_ferret indexes a record, since it > is referenced in the :fields hash. what does aaf log when you create a > new Url record ?Sorry for the delayed response. I find it annoying when thread are started that could be helpful to others and the author doesn''t take time to indicate what ultimately solved the problem. So, I won''t do that here. You''re right, url_parts IS being called when a Url is CREATED. I was thinking that it would be called upon SEARCHING. I guess that wouldn''t make sense unless you wanted to re-index everything on every search (not a good idea). So, the url_parts method works as expected. -- Posted via http://www.ruby-forum.com/.