thr3ads.net - Ferret talk - [Ferret-talk] StandardTokenizer Doesn''t Support token

If this information is useful, please help other people find it:
Share via:

Bill Burcham

2007-Aug-03 17:02 UTC

[Ferret-talk] StandardTokenizer Doesn''t Support token_stream method

According to the Analyzer doc and the StandardTokenizer doc:

  http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html
 
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html

I ought to be able to construct a StandardTokenizer like this:

  t = StandardTokenizer.new( true) # true to downcase tokens

and then later:

  stream = token_stream( ignored_field_name, some_string)

To create a new TokenStream from some_string. This approach would be
valuable for my application since I am analyzing many short strings --
so I''m thinking that building my 5-deep analyzer chain for each small
string will be a nice savings.

Unfortunately, StandardTokenizer#initialize does not work as advertised.
It takes a string, not a boolean. So it does not support the reuse model
from the documentation cited above. If you have a look at the "source"
link on the StandardTokenizer documentation for "new":
 
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html#

You''ll see that the rdoc comment apparently lies :) That formal
parameter name that should hold "lower" is named "rstr".
Fishy. A quick
look indicates that WhiteSpaceTokenizer has a similar mismatch with its
documentation.

Is there an idiomatic way to reuse analyzer chains?
-- 
Posted via http://www.ruby-forum.com/.

Seemingly Similar Threads

Search for more apparently analagous threads

Ferret talk - Aug 2007 - StandardTokenizer Doesn''t Support token_stream method

[Ferret-talk] StandardTokenizer Doesn''t Support token_stream method

Seemingly Similar Threads

Wisdom of the Ancients