Bill Burcham
2007-Aug-03 17:02 UTC
[Ferret-talk] StandardTokenizer Doesn''t Support token_stream method
According to the Analyzer doc and the StandardTokenizer doc: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then later: stream = token_stream( ignored_field_name, some_string) To create a new TokenStream from some_string. This approach would be valuable for my application since I am analyzing many short strings -- so I''m thinking that building my 5-deep analyzer chain for each small string will be a nice savings. Unfortunately, StandardTokenizer#initialize does not work as advertised. It takes a string, not a boolean. So it does not support the reuse model from the documentation cited above. If you have a look at the "source" link on the StandardTokenizer documentation for "new": http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html# You''ll see that the rdoc comment apparently lies :) That formal parameter name that should hold "lower" is named "rstr". Fishy. A quick look indicates that WhiteSpaceTokenizer has a similar mismatch with its documentation. Is there an idiomatic way to reuse analyzer chains? -- Posted via http://www.ruby-forum.com/.