search for: standardtokenizer

Displaying 20 results from an estimated 29 matches for "standardtokenizer".

2007 Aug 03
0
StandardTokenizer Doesn''t Support token_stream method
According to the Analyzer doc and the StandardTokenizer doc: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then...
2006 Sep 05
15
ferret finds ''tests'' but not ''test''
Hello all, Quick question (possibly!) - I''ve got a few records indexed and doing a search for ''test'' reports in no hits even though I know the word ''tests'' exists in the indexed field. Doing a search for ''tests'' produces a result. I would have thought that ''test'' would match ''tests'' but no such
2006 Sep 15
1
Custom analyzer not invoked?
...-------------------- require ''ferret'' include Ferret class MyAnalyzer < Analysis::Analyzer def token_stream(field, str) # Display results of analysis puts ''Analyzing: field:%s str:%s'' % [field, str] t = Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) while true n = t.next() break if n == nil puts n.to_s end return Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) end end puts ''== Adding document to index...'' index = Index::Index.new(:analyzer => MyAnalyzer.new(...
2007 Sep 07
5
Custom Analyser .. where to put it ??
...put my custom Analyser class like : class GermanStemmingAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = FULL_GERMAN_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words), ''de'') end end Any clue ? Thanks a lot Guillaume. -- Posted via http://www.ruby-forum.com/.
2007 Mar 06
1
case-sensitivity of analyzer
Is there anything about this analyzer that says "case-sensitive" to you? module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Just wondering how I can force my index to be case-insensitive. Thanks, -Adam -- Posted via http://www.ruby-forum.com/.
2007 Mar 01
4
Need help creating my own Filter in Ruby
...list to reach more people. I''m using these filters together in my analyzer (with acts_as_ferret + Ferret 0.11.1). HyphenFilter.new( StopFilter.new( LowerCaseFilter.new( MappingFilter.new( StandardTokenizer.new(str), mapping)), FULL_FRENCH_STOP_WORDS + FULL_ENGLISH_STOP_WORDS) ) The mapping filter maps pretty much all the french accents to the letter without the accent. So far so good. Only thing missing for what I want to do: I need to be able to make the w...
2007 Jan 11
5
stop words in query
...;m using AAF and the following custom analyzer: class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end However when my search term includes a stop word I never get any results back. Once I remove the stop word I get the normal results back. Do I need to do a search of my query for stop words and remove them myself? Or is there something I''m doing wrong wit...
2007 Jan 21
2
A few questions: Tweaking StemFilter, indexes, ...
...to figure out after messing around with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing...
2007 Nov 09
2
Problem with stemming and AAF
...ms'' require ''ferret'' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name => { :boost => 1, :store => :yes }, :product_number => { :boost => 2 }, :de...
2007 May 23
6
Accented characters
...?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end end And inserted this code at the end of environment.rb. Im my model: acts_as_ferret({ :fields => [ ''name'' ] }, :analyzer => PortugueseAnalyzer.new) But this did not work.... Can someone tell me what I did wrong ???? Thanks Marcello...
2007 Nov 13
8
acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
...gt; {:store => :yes}} } , {:analyzer => PlainAsciiAnalyzer.new} ) end ANALYZER lib : plain_ascii_analyzer.rb class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer include ::Ferret::Analysis def token_stream(field, str) StopFilter.new( StandardTokenizer.new(str) , ["fax", "gsm"] ) # raise <<<----- is never executed when uncommented !! end end In the console, I rebuild the index + search for a stop word => I get a results, when I should not : >> reload!; AccessPointKind2.r...
2007 Jun 25
4
Ignore apostrophes in words
Hi, I just started using ferret and the aaf plugin and it seems to work quite nicely. However, my fields are very short (titles of music) and I don''t think may users will be typing in apostrophes when they are looking for something. Right now, for a simple document such as "what i''ve done" I''d like it to be indexed as "what ive done" instead. Right
2006 Dec 08
4
Using custom stem analyzer giving mongrel errors
...'rubygems'' require ''ferret'' include Ferret module Ferret::Analysis class FerretAnalyzer def initialize(stop_words = FULL_ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, text) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words)) end end end and I''m simply setting the :analyzer option in AAF. However, I get odd behavior. The first search that I do will go through and display the proper results, but any subsequent request starts to produce odd behavior. For example when you are redi...
2009 Apr 09
4
Weird analyzer issue with the word ''fly''
...nalyzer.new, :fields => {:name => { :boost => 2.0 }, ... }}) And this analyzer is defined in a module thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Now, here''s a search without using the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size => 14 And with the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000,...
2006 Dec 06
10
Stem Analyzer
Hi all, I am trying to implement a search that will use the Stem Analyzer. I added the Stem Anaylzer from the examples shown in another post http://ruby-forum.com/topic/80178#147014 module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end The problem with the Stem analyzer is that when I search for a term such as ''engineering'', it only matches whole words that fit the stem so the only results I get back are documents where ''engin'' is a whole word (i.e. I don''...
2007 Sep 20
5
Ferret DRB, UTF-8, Mongrel
...?'',''?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } def token_stream(field, str) MappingFilter.new(StandardTokenizer.new(str), CHARACTER_MAPPINGS) end end I think Ferret is working fine... because when I run some tests, the mapping filter correctly pulls out the accented characters... exactly as it should. However, when something is persisted via the model (acts_as_ferret and DRB server), I get unexpected be...
2007 Jan 17
1
Tokenizers?
Hi everyone. First a quick word - I am relatively new to Ruby and Ruby on Rails, but I love learning about it and using it. Currently I am working on extending Boxroom (file repository RoR app) for the CARE Indonsia intranet, where I work as an intern. I am using ferret, and it''s working great. I noticed that if a file contains something like this "applications/entries", this
2006 Dec 06
1
AAF - Stem Analyzer
I''m not on AAF. Can someone else help Raymond with an example? On 12/6/06, Raymond O''connor <nappin713 at yahoo.com> wrote: > > Matt Schnitz wrote: > > You also need to stem-analyze the incoming query. > > > > I had this same problem. :^> > > > > > > Schnitz > > Do you have an example of how to do this? I''m using
2006 Oct 19
2
How to deal with accentuated chars in 0.10.8?
I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonz&aacute;lez) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,
2006 Oct 20
0
Ferret 0.10.13 released
...'', ''?''] => ''e'', [''?'', ''?'', ''?''] => ''u'', [''?''] => ''c'' } def token_stream(field, string) return MappingFilter.new(StandardTokenizer.new(string), MAPPING) end end Happy Ferreting and check the Ferret homepage[1] if you are able to contribute. Cheers, Dave [1] http://ferret.davebalmain.com/trac/