Hey Onur, just got back from a trip around Japan. You''ve probably
already worked out the answer to this question but here is how I test
tokenizers;
require ''ferret''
$stdin.each do |line|
stk = Ferret::Analysis::StandardTokenizer.new(line)
while tk = stk.next()
puts " <#{tk.text}> from #{tk.start_offset} to
#{tk.end_offset}"
end
end
And I run it like this;
ruby -r rubygems tz_tester.rb < file_to_tokenize.txt
You can just change the tokenizer to whaterver tokenizer you want to test.
Hope that helps,
Dave
On 4/12/06, Onur Turgay <onurturgay at labristeknoloji.com>
wrote:> Hi,
> is there a way to test tokenizers? I mean, I want to give input stream
> and see the output tokens.
>
> AND is there a way to see an indexed document''s index tokens?
Which
> words in the document are used to index this document?
>
> Thanks in advance
> Onur
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>