On 9/16/06, Olivier Siohan <siohan at watson.ibm.com>
wrote:> Hello,
>
> I''m trying to define my own analyzer by doing something like:
>
> #-----------------------------------------------------
> require ''ferret''
> include Ferret
>
> class MyAnalyzer < Analysis::Analyzer
> def token_stream(field, str)
>
> # Display results of analysis
> puts ''Analyzing: field:%s str:%s'' % [field, str]
> t >
Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str))
> while true
> n = t.next()
> break if n == nil
> puts n.to_s
> end
>
> return
> Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str))
> end
> end
>
>
> puts ''== Adding document to index...''
> index = Index::Index.new(:analyzer => MyAnalyzer.new())
> index << { :content => "The quick brown fox" }
> index << { :content => "The cow jumps over the moon" }
>
> puts ''== Searching Brown...''
> index.search_each(''content:Brown'') do |doc, score|
> puts "Document #{doc} found with a score of #{score}"
> end
>
> puts ''== Searching Foo...''
> index.search_each(''content:Foo'') do |doc, score|
> puts "Document #{doc} found with a score of #{score}"
> end
>
> puts ''== Searching Brown...''
> index.search_each(''content:Brown'') do |doc, score|
> puts "Document #{doc} found with a score of #{score}"
> end
>
> puts ''== Searching Cow...''
> index.search_each(''content:Cow'') do |doc, score|
> puts "Document #{doc} found with a score of #{score}"
> end
> #-----------------------------------------------------
>
> The output is:
> == Adding document to index...
> Analyzing: field:content str:
> Analyzing: field:content str:
> == Searching Brown...
> Analyzing: field:content str:Brown
> token["brown":0:5:1]
> Document 0 found with a score of 0.5
> == Searching Foo...
> == Searching Brown...
> Document 0 found with a score of 0.5
> == Searching Cow...
> Document 1 found with a score of 0.375
>
> The result is correct, i.e. documents are retrieved as expected.
> However, I don''t understand why I don''t see my
''Analyzing...'' comment
> with the corresponding string being analyzed, except when searching
> for ''Brown'', and why I''m getting an empty string
in ''Analyzing:
> field:content str:'' when the 2 documents are pushed into the
index.
>
> Any explanations? I appologize if this is a trivial issue; I''m
quite
> new to Ferret/Lucene. I use ferret-0.10.4 under linux.
>
> Many thanks.
>
> -- Olivier
Hi Olivier,
This is a bug I came across recently. It''s fixed in the the working
version. However, if you need it to work right away, take out the
inheritence from Analysis::Analyzer. It makes Ferret think you are
passing a C implemented Analyzer.
The next gem will be out soon.
Cheers,
Dave